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PREFACE 


The basic data for this volume were collected 
during the standardization testing for the new Stanford- 
Binet revision, Although a portion of the material here- 
in is based upon analyses which were essential in the 
standardization procedure, a large part represents anal- 
yses which have been made subsequently. Some of the 
data pertain to the scale as an instrument, some are 
concerned with results obtained thereby. At times the 
chief concern is with the scale as a whole, at times with 
analyses based on items, It has not been feasible to 
present more than a statistical summary of the mass of 
accumulated data; it is hoped that sufficient detail has 
been given to make the discussion intelligible, 

This volume is so closely related to Terman and 
Merrill’s Measuring Intelligence that the acknowledgments 
made therein could be repeated here, In addition, the 
writer is indebted to the Social Science Research Council 
of Stanford University for support which made the factor 
analyses possible, and to the Committee on Psychology 
and Anthropology of the National Research Council for a 
grant-in-aid for the study of scatter, The analysis of the 
data for sex differences was financed in part by funds 
granted Professor Terman by the Committee on Sex Re- 
search of the National Research Council, I am personally 
indebted to Professors Terman and Merrill for their со- 
Operation and encouragement, Much of the basic work 
for this volume was done under their direction, Credit 
is due Dr. Merrill for choosing the items for the non- 
verbal and memory scales, but she should not be held 
responsible for my interpretations thereof. Dr, Merrill 
has also rendered invaluable assistance in assembling 
the material for Appendix С. Much of the responsibility 
entailed in the preparation of this volume has been shared 
by Olga W. McNemar, To Professor Terman I am grate - 
ful for his willingness to write the introductory chapter 
and for many helpful suggestions and criticisms, 


Quinn McNemar 
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Chapter I 


THE REVISION PROCEDURES 
by 


Lewis М. Terman 


Dr. McNemar has asked me to prepare an intro- 
ductory chapter on the purpose of the New Revision, the 
Selection of test items, and the procedures employed in 
getting the standardization data and constructing the 
scales. This account can be brief, for the essential 
facts have been presented in considerable detail in the 
first three chapters of Measuring Intelligence. 3 


Purpose and Character of the Revision 


The purpose of the revision was to replace the 
single Stanford-Binet scale of 1916 by two alternative 
scales, different in content but functionally equivalent in 
every way, which would test intelligence more accurate- 
ly and over a wider range than had hitherto been pos- 
sible by scales of the Binet type. The 1916 revision, al- 
though successful beyond all expectations of the author, 
had a number of defects that needed to be remedied. It 
did not extend low enough or high enough, the accuracy 
of standardization was uneven, and the procedures for 
giving and scoring the individual tests were not always 
sufficiently defined. It was intended that the new scales 
should test as nearly as possible the same aspects of in- 
telligence as did the earlier revision, The fact that the 
latter had proved its value as a clinical tool and had be- 


1 
By 1. M. Terman and M. А. Merrill, Boston: Houghton Mifflin 


Company, 1937. 
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come the most widely used of psychometric devices was 
deemed warrant enough for replacing it by something 
Similar and better rather than by something different in 
kind, 

Although no system of psychological tests is ever 
as good as one would like, the main objections of the 
revision were reasonably well approximated. The result- 
ing scales are mutually equivalent with respect to diffi- 
culty, range, reliability, and validity; they differ from 
their predecessor chiefly in range, accuracy of standard- 
ization, and refinements of procedure. The 1916 Stan- 
ford revision contained 90 tests (compared with 54 in the 
1908 scale of Binet) and was standardized on 905 subjects 
in California and Nevada. Form L and Form M of the 
New Revision have 129 tests each, and were standardized 
by administration of both forms to 3184 subjects select- 
ed so as to sample the white child population of the main 
Seographical areas of the United States,! It is not sur- 
prising that an undertaking of such huge proportions re- 
Quired several years for its completion. The chronology 
of the revision Was roughly as follows: 


The first year WaS devoted to a search of the 
literature for test suggestions 
new test items, 


9n various procedures for 
During the fourth year two experi- 


detailed directions were 


Ve full time to the admin- 
During the Seventh year 
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Miss Mayer and Mrs. Oden checked the scoring of all re- 
sponses on the 6368 blanks, investigated the effects of al- 
ternative scoring procedures, and made up lists of satis- 
factory and unsatisfactory responses. Two additional 
years — the eighth and ninth — were required for the 
Hollerith treatment of the data and for making ир suc- 
cessive trial revisions until final scales, were derived 
which were mutually equivalent and so standardized for 
difficulty as to yield mean LQ.'s of approximately 100 
at all age levels. 


Selection of Test Items 


During the fifteen years prior to the beginning of 
work on the New Revision a large amount of informa- 
tion had accumulated on the behavior of various types of 
mental tests, Thousands of correlation coefficients had 
been published which threw new light on the interrela- 
tionships of tested abilities and on the relative merits 
of specific tests as measures of general intelligence. 
Many long-debated questions had been answered by ex- 
perience. Some tests formerly regarded as mere tests 
of erudition had been found to be highly saturated with a 


general factor; others formerly looked upon with favor 


had been found inferior or near-worthless. When the 
present undertaking was launched it was possible to fore- 
cast with considerable assurance whether a new test of 
a given type would correlate highly, moderately, or only 
slightly with a battery of tests like the 1916 Stanford- 
Binet. There would still be surprises, but the period of 


blind groping was over. 
In preparation for the New Revision hundreds of 


test items were devised and assembled for preliminary 
tryout. The criterion for inclusion in the preliminary 
series was whether the test would probably correlate 


well with the best current batteries of intelligence tests. 
Experimental tryout of the new tes 
3 


t items followed. This 
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involved their application to groups of subjects who had 
been tested by the 1916 Stanford-Binet. Test items which 
Showed a rapid rise in per cents passing at successive 
mental-age levels were tentatively selected for the final 
tryout if otherwise Satisfactory. Items that did not sat- 
isfy this criterion were at once discarded. 


liminar 
here it 
At this level 


TSe it does not follow that a steep 
Validity. This lack of mental-age 


Practicable to base the 


any particular leve], 


As for time, it was desireq that 
th 4 
ination by the revised scale Shou ae exam- 
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much beyond 75 minutes for older subjects or beyond 50 
minutes for the younger. It was thus necessary to choose 
between many brief tasks or fewer long ones. The first 
of these alternatives has two distinct advantages: it per- 
mits a wider sampling of mental behavior and it makes 
greater appeal to the child’s interest. Moreover, it is 
probably favorable to the validity of the scale as a whole. 
Consider, for example, a test that requires 4 minutes and 
correlates .45 with total score, and two others that re- 
quire only 2 minutes each and yield correlations of only 
.40 with total score. In such a case it is quite likely 
that the two briefer tests combined will have a higher 
correlation with total score than will the single 4-min- 
ute test. In general it was the policy of the authors to 
choose the briefer of two tests when other things were 
not too unequal. 

Although the final trial series contained about 50 
per cent more tests than it was planned to use, SO many 
eliminations were necessary that at certain levels there 
was a shortage of good items and some had to be retained 
that were of relatively low validity or otherwise not еп- 
tirely satisfactory. It may interest the reader to com- 
pare the first-factor loadings of the different tests as 
recorded in Tables 29-42 of Chapter IX. Dr. McNemar 
has made up lists of tests having highest or lowest first- 
factor loadings at the lower, middle, and higher age lev- 
els (124-137). 

The scales as published contain 129 tests each. In 
the final trial series Form L contained 209 tests and 
Form M 199. As it turned out, this seemingly liberal 
allowance for rejections was somewhat less than it should 
have been. The outcome would have been more satis- 
factory if a larger margin had been provided, say ап ex- 
cess of 100 per cent over the number of items that would 
be needed. The greater the amount of preliminary sift- 


ing the less, of course, will be the need for excess items 
in the final trial forms. 
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е 
The Binet scale has often been criticized eec 
of its great Variety of brief, disconnected ep “е 
‘motley array,’ as Spearman scornfully refers ее 
According to some critics, if there is any large atu 
factor measured by such tests it is purely acc йо. dE 
They contend that the logical way to proceed a5 Ане 
vise а few series of tests, each series containing ring 
items of a given kind and 50, presumably, SE 
thoroughly a given aspect of intelligence. The ae 
they recommend has its advantage in group testing in of 
it simplifies administration procedures, but no test at 
this type has ever been devised that rivals the Binet Pr 
for clinical use with Children. The latter, for all s 
*motley array’ of variegated 
teresting but also affords a 


t 
intellectual development than any of the substitutes ше 
have been suggested, Binet’s abandonment of the attemp 


to test the intellectua] ‘faculties’ аз such was his out- 
Standing contribution to PSychometrics, 


The Standardization Group 


15 to 18, 
the sexes, АП subj 


circumstances permitted, 
cribed in Measuring Intellige 
not be repeated here, 
as choice of geographic; 


е entire Population as 
These в 
псе (р 


al localities, ї urban, sub- 
› 
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urban, and rural communities, selection of schools within 
a given community, and methods of obtaining random age 
samplings wherever testing was done. It was the age 
samplings below 7 and above 14 that gave the most trouble. 
During the progress of the field work comparisons were 
made between the cumulative sampling and various census 
classifications of the general population, so that low spots 
might be filled as the testing proceeded. It was not pos- 
sible, however, to provide an adequate sampling of rural 
Subjects, because of the labor and expense involved in 
moving from one small school to another. This was less 
serious than might be supposed, since it was possible on 
the basis of found differences in mean LQ. between rural 
and urban subjects to estimate and allow for the error 


caused by inadequate rural sampling. 
One of the procedures followed deserves special 


mention, namely, limitation of the group to subjects who 
were within one month of а birthday (or half-year birth- 
day for subjects at ages 1-1/2 to 5-1/2). Besides giving 
relatively homogeneous a£e groups, this procedure pro- 
vided a more random sampling than would have been se- 
cured by the usual method. The effect, of course, was 
to multiply the spread of sampling by 6 from age 6 up- 
ward, and by 3 from age 6 downward, thus causing less 
to depend upon the choice of a given school or community. 
Apart from the inadequate number of rural sub- 
difficult to see how the sampling could have 
been much better than it was from ages 7 to 14. Above 
and below these levels, despite all the precautions taken, 
it cannot be guaranteed. Below age 4 and above 14 the 
ling is almost certainly skewed in the direction of 
Sere a probability that was taken into account 


igh LQ.'S; 
аи final forms of the test were made up. 


jects it is 


Testing Procedures 


However large and representative the standard- 
р might be, the value of the obtained data 


ization grou 
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would depend ultimately on the expertness with Sich 
testing was done. As a result of long experience in the 
training of examiners the authors of the revision were 
aware of the amount and kinds of error that may be in- 
troduced by careless test procedures, The precautions 
taken included (1) the selection of experienced examiners 
who were known to be expert and dependable, (2) care- 
ful training of the examiners in the new procedures, (3) 
wide-range testing, (4) provision for quiet and seclusion 
during the examination, (5) the administration of both 
Scales to each subject within a period of a week or less, 
(6) frequent instructions to examiners on methods of 
meeting new difficulties encountered in administration of 


Scoring, (T) the rescoring of all blanks after the field 
work was completed, 


1.0. computations, 
sulting from the fi 
Hollerith recording 


Needless to Say 
sisted largely i 


esulting from + 
he author just 


› the progress of 
chaos of subjectivity r he impromptu pr6- 
cedures advocated by t quoted, 


1 
G. H. Kent, "Suggestions f 
Simon Scale," Psychol. Rec., 1937, 409-432 f the Binet- 
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Age Placement of the Tests 


Age placement of the tests in the trial forms was 
provisional and was intended chiefly to insure that a sub- 
ject’s range of possible success and failure would be ful- 
ly covered without waste of time by unnecessary testing. 
It was only after the data were in that the relocating of 
tests for the final scales was undertaken. 

The primary objective was an arrangement of the 
tests that would yield mean LQ.'s as near as possible to 
100 at all age levels. This had to be done empirically, 
for there seems to be no possibility of a mathematical 
Solution when there are so many test items and no two 
of them behave exactly alike with respect to curves of 
per cents passing them. The plan which some have ad- 
vocated whereby all tests would be located at the age 
where 50 per cent of unselected subjects pass them sim- 
ply does not work, as it yields mental ages that are much 
too high in the lower range and much too low in the up- 
рег range. For a scale of the Binet type there is no one 
‘correct’ per cent for locating all the tests. The fact 
that adjacent mental ages become progressively closer 
together from the lower to the upper ranges, with the 
Scatter of an individual's performance increasing Cor- 
respondingly, means that tests located correctly at a low- 
erage will show a higher per cent of at-age passes than 
will a test correctly placed at a higher age. Moreover, 
the correct placement of tests for a particular age de- 
pends in part on the tests in the preceding and succeed- 
ing ages. For example, the correct per cent of at-age 
passes for tests in year XII depends partly on whether 
there are tests at years XI and XIII, as in the New Re- 
vision, or none, as in the original Stanford-Binet. 

The relocation of tests was done first for Form 
L. A tentative rearrangement was made and the result- 
ing 1.9. distributions by age were examined. А second 
revision followed and the new I.Q. distributions were ex- 
amined. Several revisions of this kind were necessary 
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before the standardization of Form L was regarded as 
satisfactory. When the task had been completed for Form 
L it was possible to standardize Form M at once by 
matching its tests age for age with the tests of Form L 
on curves of per cents passing. 

One of the most serious problems was caused by 
the lack of enough test items of exactly the right diffi- 
culty at certain levels. This could sometimes be handled 
by rescoring a tcsi on a different standard of pass or 
failure, and thus making it easier or harder as the situ- 
ation demanded. In a few cases, however, relatively in- 
ferior tests had to be retained, and there are 16 test 
items that were included in both зса]ез.1 

The placement of tests at the three adult levels, 
where the mental-age scores are fictitious units, was 
governed by the same requirement as at other levels, 
namely, that the resulting mean LQ. in the standardiza- 
tion sample should approximate 100 for the subjects at 
each chronological age. In computing LQ.'s of older sub- 
jects, chronological age was dropped off gradually in- 
stead of all at once. Four months were dropped in the 


14th year, four in the 15th, and four in the 16th, That 
is, maximum С.А. used in the divisor is now 15, The 


not true of the 1916 revision | 


than to increase some 
what 
the danger that a test may b 


been given. It should be understood, moreover, that the 
order of difficulty of the tests for our Standardization 


Tie duplicated items are scattered over a considerable art 
of the scales and are not numerous enough in the range im 

which a single subject would be tested to affect ap 5 ег 
the correlation between the two scales, Preciably 
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group is not necessarily that which will be found for spe- 
cial groups such as older adults, psychopathic subjects, 
Negroes, Indians, etc. 


Age Scales versus Point Scales 


The choice between age scale and point scale is 
not necessarily a choice between М.А. and LQ. scores on 
the one hand and point scores on the other. Any point 
Scale designed for children will of course be provided 
with age norms, and these age norms are nothing more 
nor less than mental-age scores. Moreover, the inevit- 
able comparison of М.А. and С.А. brings us back to 1.0. 
Scores, However strongly the makers of point scales 
condemn the M.A. and І.О. as measuring units, the users 
of such tests always revert to them, at least as long as 
they are dealing with children. - 

Some point-scale authors are so allergic to Binet 
LQ.'s that in trying to avoid them they fall into amusing 
Statistical pitfalls, For example, Wechsler! proposes 


that instead of expressing I.Q. as М.А. it should be ех- 


attained or actual (point) score б 
г 
pressed ав expected mean (point) score for age ' This 


method leads to interesting results when the point scale 
has an arbitrary zero point, as nearly all of them do. 
By arbitrary zero point is meant one that represents 
something higher than zero ability. Such a scale is anal- 
ogous to a measuring-stick on which 24 inches is called 
zero, 25 inches is called 1 inch, 26 inches is called 2 
inches, etc. When the height of an infant is measured by 
Such and turns out to be 1 inch (really 25 inches), where- 
as the norm for his age is 2 inches (really 26 inches), 
the infant's Height ©. is exactly 50. If measured height 


1 
David Wechsler, The Measurement of Adult Intelligence. Bal- 
timore: Williams and Wilkins Company, 1939; see p. 25. 
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had been .2 inch (really 24.2 inches), the Н.О. would have 
been 10. That is, height deviations within the small range 
of .8 inch from the norm would yield H.Q.’s from 10 to 
100. Vocabulary quotients exactly analogous to these ab- 
Surd height quotients have actually been used. 

As we have noted above, however, point scores 
can be converted into M.A.'s and LQ.'s which are com- 
parable with the M.A.’s and LQ.'s of an age scale of the 
Binet type. In view of the fact that a point scale is more 
easily standardized, since itis not necessary to juggle 
the tests about until they have been properly assigned to 
age groups, the question may be raised why the authors 
of the New Revision went to so much unnecessary labor 
to produce age scales. The answer is that the age ar- 
rangement is preferred by a majority of examiners be- 
cause it enables them to follow more intelligently the 
test-by-test progress of a Subject during the course of 
the examination, The authors believe that this advantage 
of the age scale warranted its extra cost in time and la- 
bor. 

For a mathematical discussion of units of meas- 
urement the reader is referred to Dr, 


McNemar's treat- 
ment of the subject in Chapter XI, 


Outcome of the Standardization 


No extended discussion of the results of the stand- 
ardization is here necessary since the problem is treated 
from various angles in the chapters that follow. A few 
points may perhaps be emphasized without duplicating un- 
duly the material of other chapters, 

Accuracy of Standardization, 
posely adjusted the scales so that mean 1.Q.’s of the 
standardization group would be slig 


htly above 100. The 
main justification for this was the in: 


г adequate Sampling of 
rural subjects, previously referred to, Additional allow- 


ance in this direction was made for ages below 4 and 


- The authors pur- 
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above 15 because of reasons for believing that the sample 
tested was definitely superior at these levels. The means 
for the two scales run closely parallel and the smoothed 
means for the composite of L and M (Measuring Intelligence, 
Page 36) give the truest picture of the accuracy with 
which the standardization fits the sample tested. The 
greatest difference between any two means in the entire 
range is 4,3 LQ. points. From ages 4 to 15 inclusive 
the greatest difference is 3.2 points. Of the fourteen 
means in this range, ten differ from 100 LQ. by 0.0 to 
1.6 points and four by 2.0 to 3.0 points. 

Reliability, - It would appear from the data pre- 
Sented in Chapter VI that a degree of reliability has been 
attained that would be difficult to exceed without extend- 
ing the examination beyond practicable time limits. The 
reliabilities have been expressed in terms of both 7 
(standard error of measurement) and the equivalent re- 
liability coefficients. An important fact brought out by 
Dr. McNemar 1$ that се varies directly with size of 1.0. 
It may be reassuring to clinicians who work largely with 
backward subjects to know that the New Revision 15 par- 
ticularly reliable at low I.Q. levels. 

Attention may be called to the fact that since all 
Subjects were given both the L and M scales, and since 
the tests were administered by expert examiners, the 
data give an exceptionally accurate picture of the reliabil- 
ity factor. 

Validity. - The somewhat futile war of words re- 
garding the validity of this or that intelligence test has 
died down as a result of the growing custom of defining 
validity in operational terms. A test tests what it tests, 
and the nature of this ‘what’ only becomes clear as the 
test is used and the results checked. Here it is perhaps 
Sufficient to note that Forms L and M correlate with the 
1916 revision about as highly as their respective relia- 
bilities permit. The new scales test whatever the old 
one tested, but with somewhat greater accuracy. 
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The fourteen factor analyses which Dr. McNemar 
has made of the tests at successive levels throw some 
light, despite their avowed limitations, upon factors meas- 
ured, In particular they indicate that at no level do the 
tests measure a medley of factors as some have believed, 
everywhere there is one factor that stands out clearly 
with only occasional and none too reliable evidence for a 
second or third factor, Furthermore, by the ingenious 
method of having the adjacent analyses overlap in respect 
of test items included, Dr. McNemar has demonstrated 
that sudden changes in the nature of the primary factor 
do not occur from level to level, His data in fact indi- 
cate, although they do not fully demonstrate, that the 
primary factor is the same at widely separated levels. 
This tentative conclusion could be checked by retesting a 


large group of subjects over a period of ten or fifteen 
years, 


TERE % жожо ж жж ж 


This report on the stan 


dardization data from which 
the 1937 Stanford-Binet Scal 


€S were derived has long been 
overdue, Its appearance at this time has been made ро5- 


sible by the fact that Dr. McNemar, on the urgent request 
of the New Revision authors, consented to write the sev- 
eral chapters for which they had originally been sched- 


uled. The delay in publication, although regrettable, is 
offset by the high professional quality of the job рг. Mc- 
Nemar has done, 
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Chapter II 
ON THE DISTRIBUTION OF LQ.'S 


Since the introduction of Pearson's system of fre- 
quency curves, a large amount of energy has been ex- 
Pended in graduating frequency distributions. Many early 
biometric studies had as their sole objective the exact 
mathematical specification of the distribution of some 
Characteristic, These studies definitely showed that the 
types of distribution are many, and that the normal curve 
is far from being the rule. It was only natural for psy- 
Chologists to be influenced by the early work of the bio- 
metricians, but unlike their predecessors, the psychol- 
ogists seemed to find normal distributions more often 
than other types. This led to the assumption that the 
normal curve was the ideal, and that exceptions thereto 
Should be explained. It also led some to speculate, via 
the binomial expansion, as to the nature of the constitu- 
ent elements of mental life. 

That the shape of the distribution for any psychol- 
Ogical trait is partly dependent upon the units of meas- 
urement employed is so axiomatic that one wonders why 
it should have ever been assumed that deductions could 
be made concerning whether the underlying, indirectly 
measured, trait is normally distributed. Furthermore, 
the ease with which the shape of a distribution can be 
altered by a change in test difficulty should also have 
Served as a warning to those who were out to demonstrate 
the normal law for psychological traits. It is our con- 
tention, certainly not original with us, that nothing can 
be inferred from the distribution of measured psychologi- 
cal traits with regard to the shape of the distribution 
which would result if we ever found a psychometric of 
truly equal units. 

The most ambitious attempt to show that intelli- 
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gence is normally distributed is to be found in Appendix 
Ш of Thorndike's Measurement of Intelligence. Therein 
are given distributions for single tests which vary to an 
unspecified degree from that expected on the basis of the 
normal law. Composite distributions, secured by averag- 
ing the frequencies for single tests, were so closely fit- 
ted by the normal curve as to leave no doubt in one’s 
mind about the goodness of fit - too good to be above 
Suspicion. His very high chi square probability figures 
of order .99 to .999999 simply indicate either statistical 
misapplication or an undue restraint on the operation of 
chance. It happens that the process of averaging fre- 
quencies does interfere with the type of chance discrep- 
ancies which might be tested by chi square; hence we do 
have a faulty use of Statistical method. Aside from the 
unacceptable chi square probabilities, the fact remains 
that the histograms based on composites at least appear 
to be normal. It Seems safe to assume that the makers 
of the severa] Single tests or Scales were guided more 
or less by the practical advantage inherent in a test 
which yields a normal distribution, and it seems likely 


that various factors are involved in the lack of normal- 
ity for the Several single curves, 
such curves are averaged, one might expect a balancing 
of these factors; 50 i 


unless one is willing t 
frequencies (ordinates) 


an argument, 
Now Thorndike has faced the Problem of i айг 
ity, or the unknown equivalence, of units of eo E 
His Chapter VIII is devoted to the Proposition ери 
intel- 
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ligence is distributed normally provided the original 
Score units are first transmuted into a scale of ‘truly 
equal units.’ Since this transmutation is accomplished by 
means of normal curve functions, one need not be sur- 
prised to find that the distributions, so adjusted, seem to 
substantiate the normal hypothesis. This is especially 
true (and to be expected) when composites, including the 
total group upon which the scaling was based, are consid- 
ered. This entire procedure is another example of man 
trying to lift himself by tugging at his own boot straps. 
It is our opinion that we have no exact knowledge about 
the distribution of intellect, and that Thorndike has only 
demonstrated the obvious: a normai curve can be pro- 
duced by assuming it in advance. 

We have already indicated another way of securing 
à normal curve, namely, by selecting items which are of 
medium difficulty for a given group. А test so construct- 
ed will yield distributions closely approximating the norm- 
al form for groups which are similar in ievel of ability 
and homogeneity to the starting group, but will likely yield 
skewed distributions for groups of other ability levels. 
There can be no objection to this procedure provided no 
claims are made regarding the normality of the underly- 
ing trait. In fact, it is convenient to have a normal dis- 
tribution for a trait as measured since the statistical as- 
pects of the Gaussian curve are relatively simple and 
generally known. In revising the Binet scale, no attempt 
was made, as erroneously claimed by some, to secure a 
normal distribution of the resulting LQ.'s. Items were 
chosen in such a manner that the average M.A. for an 
age group coincided with their C.A. The per cent pass- 
ing with age curves for the items were ogival, but not 
necessarily normal ogives. The one thing about the scales 
which would tend to produce a normal curve of distribu- 
tion is the fact that each age group is tested by items 
which cover the entire range of difficulty, This means 
that in testing a group of a given age, the varying dif- 
ficulty of all the items attempted is such that it can be 
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Said that on the average the items are of medium diffi- 
culty for that particular group. Consequently one might 
very well expect the М.А/5 and LQ.'s for a single age, 
and the LQ.'s for several ages combined, to approximate 
normality. 

Frequency distributions for Forms L and M sep- 
arately for three age groupings and for all ages com- 
bined are presented in Tables 1 and 2. At the bottom of 
each table will be found the N’s, means, S.D.’s, measures 
of skewness and kurtosis (based on moments) and their 
standard errors,’ and the chi Square probabilities for ob- 
taining as large discrepancies from the best-fitting normal 
curves. (The grouping of end intervals for the computa- 
tion of chi square is indicated by braces.) When the val- 
ues for é, (skewness) and £, kurtosis) do not depart sig- 
nificantly from zero, it can be said that the hypothesis of 
normality is tenable provided the chi square probability 
› .01. For all ages combined, the 


From the dat 


1 and 2, it seems safe to Set forth the following brief 
Summary: For ages 2-1/2 to 5-1/2, both forms, the 
Sonably Symmetrical but more peaked 
For ages 6 to 13, again both 
urtosis are not significantly 
the irregularities are highly 


; : e small chi square probabil- 
ity figures. For ages 14 to 18, it can be Said that both 


ages 2-1/2 to 18 combined, th 
in skewness, but are more peaked than expected on the 
basis of the normal curve, 


1 
Cf. pp. 78-79 in R. А. Fisher? 


S Stat 
Search Workers. (6th Ed.) с 


Edinburgh; 
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104 
% 
5- 
To г” т -I T 
IQ. 60 80 100 120 140 
Fig. 1. Form 1, I.Q. distribution and best-fitting 
normal curve, ages 24 to 18 
104 
% 
54 
т 
IQ. 60 80 100 120 140 


Fig. 2. Form M I.Q. distribution and best-fitting 
normal curve, ages 28 to 18 
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The exhibited departures from normality are 
scarcely large enough to disturb practical interpretations 
of LQ.'s as though normally distributed. The degrees of 
Skewness, kurtosis, and irregularity of the distributions 
are certainly not so great as to invalidate the use of 
Sampling and correlational techniques in researches in- 
volving the Stanford-Binet LQ. asa variable. 

We draw no conclusions from these data concern- 
ing the distribution of intellect. As measured by the New 
Stanford Revision of the Binet tests, 1.0.75 are approxi- 
mately normal in distribution, but we make no claims 
for the equality of the units involved. 

Incidentally, the fact that the means in Tables 1 
and 2 are above 100 should not lead the reader to the 
erroneous conclusion that the average 1.0. for the popu- 
lation now exceeds 100. The excess here observed is in 
the proper direction to allow for known bias in our age 
Samplings. When ап adjustment is made for bias in oc- 
cupational status, the аве means approach nearer 100, 
and a further adjustment for inadequate rural representa- 
tion would tend to bring the values still closer to 100. 
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ON THE DISTRIBUTION OF LQ.'S 
TABLE 1 


FREQUENCY DISTRIBUTIONS AND CONSTANTS 
FOR FORM L LQ.'S 


2-1/2-5-1/2 6-13 14-18  2-1/2- 18 


170 1 1 
| 165 1 0 1 
160 3 14 12 4t 13 
155 27 13 1 3 
150 2 2 
145 5 Т, 2 14 
140 6 2 4r 20 32 
135 12 31 14 57 
130 29 42 12 83 
125 35 65 28 128 
120 47 15 51 173 
115 87 145 53 285 
110 88 163 68 319 
105 93 184 52 329 
100 100 223 58 381 
95 47 150 87 284 
90 60 185 61 306 
85 44 125 47 216 
80 32 90 31 153 
75 14 43 26 83 
70 6 33 13 52 
65 19 1 24 
60 3 11 8 22 
55 3 2 2 
50 2° 15 2> 16 0 «12 4 
45 1 0 of 1> 16 
40 1 1 1) 3f 
35 1 1) 
N 728 1623 619 2970 
M 106.58 103.22 103.03 104.00 
S.D. 17.32 16.83 16.89 17.03 
Ё -.131 .133 -.090 .028 
| с, .091 -061 .098 .045 
ba 1.117 -209 -.150 .346 
сё, .182 .122 .196 .090 
p .006 .003 .06 .03 
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2-1/2 - 


TABLE 2 
FREQUENCY DISTRIBUTIONS AND CONSTANTS 
FOR FORM M 1.0.75 
2-1/2-5-1/2 6-13 14-18 

_ 1 
2) 0 
2 3. 16 
3519 5 Е 
4 " ) 
8 25 er 11 
9 19 8 
24 38 22 
35 63 23 
62 107 40 
75 151 62 
92 166 64 
73 188 56 
112 212 78 
67 160 66 
59 182 58 
35 117 54 
28 79 25 
18 43 24 
10 25 10 
3) 21 9 
1 7 7 
2 2 1} 18 
110 1811 0 
2 д 1 
0 1 
1 
728 1623 619 
106.42 103.96 103.39 
16.72 16.55 17.11 
-.109 .125 -.050 
.091 .061 “098 
.925 .218 -4 
082 122 159 
.09 .009 437 
22 
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Chapter III 


ANALYSIS BY AGE-GRADE 


In the previous volume, Measuring Intelligence , 
means and standard deviations for each of the twenty age 
groups were given for Forms L and M separately. It is 
the purpose of this chapter to present the data in terms 
of 1.0. and М.А. for grade and age-grade groupings. We 
shall be concerned only with those subjects of ages 6 to 
18 who were in grades 1 to 12. Since the two forms are 
so Similar that parallel analyses would become highly rep- 
etitious, and since scores based on an average of the two 
forms are more reliable, L-M composite! M.A.'s and 
1.Q.’s are utilized here. 

The age-grade distribution of the subjects who were 
in grades 1 to 12 and of ages 6 to 18 is set forth in Table 
3. At this place it should be recalled that the selection 


of subjects, particularly for ages 6 to 14, was such that 
no selective factors within a school were operative - all 
children in a given school who were within one month of 
a birthday were utilized regardless of their grade loca- 
tion. The schools chosen were of average social status 
for their communities, which had been selected so as to 
yield representative samplings. The fact that different 


1тһ15 composite is actually based on penultimate scoring. 
Minor changes in final scoring procedure for the final norm- 
ative data resulted in slight changes inthe I.Q.'s of a large 
proportion of the subjects, but these changes were such that. 
shifts occurring in means and standard deviations were negli- 
gible as judged from data on age groups 8 to 15 for which 
comparisons could easily be made. Since the discrepancies 
4n the means and S.D.'s for these eight groups were .5 or 
less, it would seem permissible to use the penultimate scores 
їп the analyses for age-grade and also for occupational status. 
The penultimate scores are not being used by preference but 
pecause the final scores were not on the particular Hollerith 
5 which contained grade and occupational information. 
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communities and schools are here involved does place 
one limitation on an analysis of intelligence by age "grade; 
namely, the likely differences in promotion practices in 
the various school systems. The frequencies in Table 3 
cannot be thought of as representing adequately any one 
community, nor can one be sure of their generality. It 
is of interest, however, to note that the figures in Table 
3 for the elementary grades show essentially the same 
features as the age-grade distribution of 500,000 Califor- 
nia children in 1921-1922,! 

If one examines Table 3 it will be seen that the 
6-year-old subjects were mostly in grade 1 (50 were in 
kindergarten, and 22 not in School, and that more of the 
"-year-olds were in grade 1 than in grade 2, Thence 
up the diagonal the modal Írequencies occur at grade 2, 
age 8; grade 3, age 9, etc. About these modal age-grade 
groups the distributions for constant age and for constant 
grade tend to be skewed. The extremes of retardation are 
greater than the extremes of acceleration. Furthermore, 
if the modal values be regarded as representing normal 
School progress, the number accelerated by one grade 
exceeds the number retarded by one grade. 

Table 4 contains the means for L-M composite 
LQ.'s by age-grade, by grade, and by age (the means by 
age, it should be noted, are for School cases only) There 
would seem to be a Slight relationship between grade lo- 
cation and mean I.Q. (see right-hand margin), even though 
LQ. is not related to age except for the slightly higher 
values for ages 17 and 18 (see bottom margin) The 
Striking, though not Surprising, facts in Table 4 are to 
be observed by following up the modal age-grade diagonal, 
beginning with the cell for age 7 and grade 1, The means 
for the modal groups are from 0 to 4 points below the 
respective age averages. This fact, coupled with the 
data of Table 3, which show that the Írequency for the 


1 
See T. L. Kelley, "Ridge-Route Norms," 


Harvard Educ. Rev., 
1940, 10, 309-314. 
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next grade above the modal grade for a given age is fair- 
ly large compared to the frequency in the modal grade, 
would suggest that the modal age-grade groups do not 
really represent normal progress pupils in the sense of 
ап average. It would seem likely that modal values based 
on individuals who were within one month of 7-1/2, 8-1/2, 
etc. years of age, rather than 7, 8, etc., would provide 
modal groups which could be taken as a more exact in- 
dex of normal progress through school. 

Turning again to Table 4, it will be observed that 
those who are accelerated by one grade (we are here ac- 
cepting the modal age-grade as the norm) tend in gen- 
eral to average about 11 LQ. points above the normal 
progress groups, while those who are retarded by one 
grade are about 11 points lower than the normal. Those 
cases who are accelerated by two grades and those re- 
tarded by two grades deviate about 22 points from the 
normals, while those children (too few to justify report- 
ing means) whose acceleration or retardation is greater 
than two grades tend to possess L.Q.'s still farther from 
those of the modal groups. It is thus seen that school 
progress for individuals of a given age is definitely re- 
lated to, if not dependent upon, the intelligence factor. 
We hasten to point out, however, that we have been deal- 
ing with averages and that our figures do not indicate 
any greater predictability for individuals than has here- 
tofore been possible. The standard deviations (see Table 
9) for age-grade LQ. distributions indicate definitely that 
age-grade location will not permit a very accurate pre- 
diction of an individual’s LQ., and from this we know 
that the reverse prediction will be far from perfect. 

The standard deviations in Table 5 show no par- 
ticular trend except that, as regards LQ., an age-grade 
group is more homogeneous than either an age ora 
grade group. Considered along with the means of Table 
4, they provide an indication of the amount of overlapping 
between the groups of dissimilar school acceleration or 
retardation. 
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Our discussion of the results in Table 4 have so 
far been chiefly concerned with comparisons about the 
diagonal in a vertical direction, i.e. with the 1.0.75 of 
those of a given age but with grade location varying. Let 
us now re-examine the table, with particular attenticn to 
the horizontal rows. For a given grade, age varies, and 
the mean I.Q. varies inversely with age - the younger in 
a grade have I.Q.’s above average, while the older have 
lower LQ.'s. It does not follow from this that a grade 
group is more heterogeneous than an age group as re- 
gards the LQ. index of brightness, The standard devia- 
tions along the right-hand and bottom margins of Table 5 
show that grade groups do not differ appreciably or sys- 
tematically from age groups in LQ. variability. Nor does 
the presence of younger bright children and older dull 
children in a given grade indicate, ipso facto, that the 
older dull are handicapped in their competition with the 
younger and brighter individuals, 

Actually, when we turn to Table 6, wherein will be 
found the mental age means by age, by grade, and by age- 
grade, we see that the mental maturity of the younger is 
only slightly greater than that of th 
same grade, 
tardation of o 
individuals of 
the variability 
of fact, 


ade automatically by age 
from Table 7; the S.D.’s 
margin) are smaller than 
Бе groups, except for the 
are less affected by ac- 


only. This is clearly evident 
for M.A.'s by grade (right-hand 
the corresponding values for a 
first two or three grades which 
celeration and retardation. 
Thus it is seen that promoti 
materially decreased the wide var 
within a grade from that Which w 
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tions were made entirely on the basis of age. The ex- 
tent of such variation is summarized in Table 8, which 
shows for each grade the M.A. distance separating the 
highest sixth from the lowest sixth and the highest 2 per 
cent from the lowest 2 per cent. It will be noted that 
the gap between highest sixth and lowest sixth starts at 
1.8 M.A. years in the first grade, increases to 2.8 in the 
fifth grade, and to 4.0 in the eighth grade. The gap be- 
tween the highest 2 per cent and the lowest 2 per cent 
starts at 3.6 M.A. years in the first grade, increases to 
5.6 years in the fifth and to 8.0 years in the eighth. Be- 
yond the eighth grade, there is little tendency for increase 
in variation. 

In view of the relatively large N’s on which the 
above data are based, the representative nature of the 
sampling, and the accuracy of the mental-age measure 
(composite of Forms L and M), there can be no doubt 
that Table 8 gives an essentially true picture of the re- 
sults of grading practice in the schools of this country 
during the 1920’s. The facts speak for themselves, A 
teacher confronted with the necessity of teaching en masse 
third-grade children ranging in mental age from 6 to 12 
years, or sixth-grade children of mental ages from 8 to 
16 years, must indeed be versatile if she is to provide 
learning situations appropriate for all. It is interesting 
that the widespread use of group mental tests and stand- 
ardized achievement tests for fifteen years prior to the 
collection of these data seem to have had very little ef- 
fect upon the heterogeneity of mental ages in'a given 
school grade. The situation revealed in Table 8 in fact 
parallels rather closely that found by Terman and his 
associates between 1915 and 1920." 


15ее especially the following references: 

Virgil Dickson, Mental Tests and the Classroom Teacher. 
Yonkers, New York: World Book Company, 1923. L. M. Terman, 
The Intelligence of School Children. Boston: Houghton Mif- 
flin Company, 1919. L. M. Terman et al., The Stanford Re- 
vision and Extension of the Binet-Stmon Scale for Measuring 
Intelligence. Baltimore: Warwick and York Inc., 1917. 
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The fact of individual differences and the question 
as to what the schools should do about them have provoked 
far more discussion than research. Few would deny that 
it is the responsibility of the school to adjust to the de- 
velopmental needs of the child, but there would seem to 
be little unanimity as to what type of adjustment is best 
from the child's standpoint. It is not our purpose to 
argue here the pros and cons of the various provisions 
which have been made for the bright and dull А few 
Observations, however, may be in order. We agree with 
those who believe that retardation does not solve the 
problem ofthe sub-average, Their all-round development 
might be served better by regular promotions with ex- 
tensive adjustment of content and activity. That marked 
acceleration of the above-average may not be the best 
Solution is indicated by the judgments of a large propor- 
tion of the Terman gifted group. Some 80 per cent now, 
as adults, believe that rapid school acceleration was 
harmful for them, Again it appears that adjustments in 
the provisions for learning should be made without too 
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Chapter IV 


URBAN-RURAL, OCCUPATIONAL, AND SIBLING 
RELATIONSHIPS 


A quarter of a century ago such an accumulation 
of data as we can here present would have been hailed 
as definite proof that intellectual differences have an 
hereditary basis, but at the present time these data will 
not be regarded as of crucial significance in a field of 
controversy. The results, however, are of interest in 
that they provide rather definite information on certain 
concomitants of LQ. variation, even though one cannot 
conclude therefrom that I.Q. variation is determined any 
more by hereditary factors than by the cultural milieu 
provided by parents. 

All the data in this chapter except those for sib- 
lings are based upon penultimate scoring (see the first 
footnote in Chapter Ш). The age samplings, it will be 
recalled, were such as to yield a slight bias as regards 
occupational status of the father. We have no reason to 
Suspect that within an occupational group any selective 
factors have been operative, except in the case of the 
Denver preschool group (see Measuring Intelligence , page 
18), which is not being included in the occupational and 
urban-suburban-rural comparisons. The problem of se- 
curing samplings which will be representative of the 
urban, suburban, and rural populations is complicated by 
the possibility of differences between urban, or suburban, 
or rural communities. Our samplings are not adequate 
for the rural population — the extremely rural regions 
are not sufficiently well represented. It is because of 
this bias that Terman and Merrill were willing to toler- 
ate 1.9. means in excess of 100 even after adjustment 
for the bias in sampling as regards occupational status 
(see Measuring Intelligence , Table 6). 
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To secure an adequate sampling of rural children 
is especially difficult because of the vast differences in 
so-called rural communities. For example, a certain 
California community has all the superficial character- 
istics of ruraiism, but upon closer scrutiny it is found to 
be highly residential (retired people, long-distance com- 
muters, ranch hobbyists, etc.) as compared to a rural 
region of the Dakotas or up-state New York or interior 
California. We stress the difficulty involved in rural 
sampling so that the reader will not draw any unqualified 


conclusions from the data on urban, suburban, and rural 
groupings. 


LQ.'s of Urban, Suburban, and Rural Children 


The samplings for urban children and the number 
of cases by communities were as follows: Denver, 111 
(excluding preschool as highly selected); Minneapolis, 183, 
New York, 48; Reno, 112; Richmond, Virginia, 187; San 
Antonio, 254; and San Francisco, 527. These cities should 
provide a fair urban cross-section. We are not reporting 
data separately by communities, but it can be noted that 
differences between the mean 1.0.75 for these cities tend 
to be small. 

Into the suburban classifications we place the fol- 
lowing communities: White Plains, New York, 160; Red- 
wood City, California, 134; Los Gatos, California, 314; 
and four small communities just out of Kansas City in 
Johnson County, Kansas, with 199 cases drawn from 
Westwood View, Hickory Grove, Roseland, and Shawnee 
Mission schools. We admit to Some arbitrariness in 
this grouping, but all these communities tend to be be- 
tween urban and rurai as regards their residential char- 
acter and dependence upon larger near-by cities. 

The samplings from rural communities include 85 
from the Mount Washington School, Bullit County, and 
Liberty School, Oldham County, Kentucky. A total of 152 
were drawn from the following districts of Indiana: 
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Prather School, Charlestown schools, and Borden High 
School in Clark County, Palmyra School and Morgan Town- 
ship School in Harrison County, and Galena School in Floyd 
County. A farming region at Bloomington, Minnesota, 
Supplied 92 cases; the farming and small village com- 
munity of Randolph, Vermont, provided 275; and 65 sub- 
jects were secured in the vicinity of Atlee, Virginia. We 
have already expressed some skepticism concerning the 
representativeness of these communities. 

Data on LQ.'s for children in urban, suburban, and 
rural communities are presented in Table 9 for three age 
groupings. It will be seen that there is no appreciable 
difference between urban and suburban - a fact which 
should evoke no surprise. The mean for rural children 
of ages 2 to 5-1/2 is much nearer the urban and sub- 
urban averages than one would expect. Аз already im- 
plied, we believe that more adequate rural samplings 
would result in lower means than those given in Table 9. 


TABLE 9 


LQ. DATA FOR URBAN, SUBURBAN, AND RURAL CHILDREN 
(Denver 2 to 5-1/2 year-olds excluded) 


15 - 18 


2-5-1/2 6-14 
Urban Suburban Rural Urban Suburban Rural Urban Suburban Rural 
N 354 158 144 864 537 422 204 112 103 
М 106.3 105.0 100.6 105.8 104.5 95.4 107.9 106.9 95.7 
7 15.7 16.1 15.4 14.7 16.8 15.5 16.5 15.7 15.9 


Occupational Differences 


It has long been known that I.Q.’s of children tend 
to vary with the socio-economic status of their families, 
and that within any ore socio-economic group there is 
considerable residual variation which is independent of 
socio-economic level. The material which can be pre- 
sented here is extensive enough not only to confirm pre- 
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TABLE 10 


L-M COMPOSITE 1.0.75 ACCORDING TO FATHER’S OCCUPATION 


Father’s Occupational Classification 2-5-1/2 6-9 10-14 15-18 
1. Professional N 36 31 41 16 
M 114.8 114.9 117.5 116.4 
c 15.2 12.7 16.8 10.9 
Ц. Semi-professional and N 50 52 75 38 
managerial M 112.4 107.3 112.2 116.7 
c 14.2 12.3 15.8 12.6 
ШІ. Clerical, skilled trades, N 175 199 243 85 
and retail business M 108.0 104.9 107.4 109.6 
c 14.6 147 16.4 15.5 
IV. Rural owners N 59 106 154 91 
M 97.8 94.6 92.4 94.3 
с 15.0 13.7 15.9 17.8 
У. Semi-skilled, minor clerical, WN 224 249 289 104 
minor business M 104.3 104.6 103.4 106.7 
с 14.7 14.4 16.1 15.2 
VI. Slightly skilled N 59 т 90 32 
м 97.2 100.0 100.6 96.2 
с 18.9 12.8 15.3 14.5 
VIL Day laborers, urban and N 30 58 61 21 
rural 
НЕ м 93.8 96.0 97.2 97.6 
фе 


13.1 13.0 15.9 11.5 


vious findings but also to permit a breakdown by age 50 
that possible trends might come to light. We therefore 


present means and standard deviations (Table 10) for 
four different age groupings 


The Goodenough scheme of classifying on the basis 
of the occupational status of the father was followed. We 
hold no brief for this particular method of rating; it is 
as satisfactory or as unsatisfactory as one chooses to be- 
lieve — we would prefer a better Scheme. It should be 
noted, however, that such differences as are found be- 
tween the average LQ.'s for the seven classification 
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groups will have occurred in spite of the inadequacies 
of the classification scheme. The variability within a 
class is, of course, partly due to the heterogeneity of 
the occupations which go to make up the class. 

The results as summarized in Table 10 need little 
discussion. The I.Q. means for children whose fathers 
are in the professional group tend to be about 18 or 20 
points higher than the means for those whose fathers are 
in groups IV, VI, and VIL The overlapping of the high- 
est with the lowest group is such that only about 10 per 
cent of the children in the latter group exceed the mean 
LQ. for the former. There is also the fact that about 
10 per cent of the children of professional men have 
LQ.'s which are below the general average. This could 
be a genetically determined phenomenon which occurs 
despite the supposedly superior environment provided in 
the homes of the professional class. One also notes 
Írom Table 10 that the range of means is as great for 
the lowest age grouping! as for the later ages. 

]t is difficult to discern any trends with age in 
Table 10. Minor or chance fluctuations may be found 
where the N's are relatively small, and groups IV, VI, 
and VII tend to switch in relative positions. The low 
values for group IV would indicate that few, if any, bank- 
ers and large ranch owners have been included in the 
Sample as rural owners. 


Sibling Resemblances 


The data for siblings which we can assemble are 
unique in several respects. (1) The LQ.'s, being based 
on a composite of Forms L and M, are less subject to 
measurement errors than those used in previous studies 


1 
These means will not check with those given in Table 12 of 


Measuring Intelligence, which were based on ages 2-1/2 to 
5-1/2, erroneously transcribed as 2 - 5-1/2. 
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of sibling resemblances. (2) Scores on the same scale 
are available for sibling pairs with wide age differences 
for the two members of a pair. (3) By subdividing, it 
is possible to report correlations for preschool versus 
preschool sibling; for preschool versus older, in school, 
sibling; for young, say 6 to 11, versus older, 12 to 18, 
sibling; for pairs with both members between ages 6 to 
11; ditto, ages 12 to 18, and for all sibling pairs re- 
gardless of age. Such correlations are in reality cor- 
relations between indices, but we show in Appendix A 
that there is no spurious element involved. (4) Our sib- 
ling samples may be regarded as being more representa- 
tive than those used in previous studies. 

It should be noted that twin pairs were not includ- 
ed as such, although each member of a pair was plotted 
against other siblings in the family. The correlations 
were computed from double-entry scatter plots except in 
those cases where selected younger were being corre- 
lated with older siblings. The standard errors for the 
resulting correlation coefficients have not been determined 


TABLE 11 


SIBLING RESEMBLANCES 


No. of No. of 

families pairs о, Ty ғ 
АП possible 263 384 16.4 16.4 .53* 
Ages 12 to 18 only 34 38 17.2 17.2 .54* 
Ages 6 to 11 only 70 80 16.1 16.1 .57* 
Ages 6 to 11 versus 12 to 18 80 104 18.1 15.6 .48 
Ages 2 to 5-1/2 only 41 42 15.6 15.6 .55* 
Ages 2 to 5-1/2 versus 6 to 18 81 119 15.9 15.1 .52 


* Double-entry 
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for the simple reason that no formula is available which 
is adequate for the sibling situation involving a varying 
number of children per family. A maximal value will be 
yielded by using the number of families as N. 

The results are summarized in Table 11. If these 
correlations were corrected for attenuation, they would 
run about .02 or .03 higher. Thus, without any further 
corrections, it can be said’that our 384 pairs of siblings 
representing 263 families could be expected to show a 
resemblance of about .55 or .56 if measurement errors 
were not present. The more interesting facts, however, 
are the coefficients for preschool versus preschool (i.e. 
siblings of ages 2 to 5-1/2) and the preschool versus 
older siblings. Evidently the factors which tend to pro- 
duce family resemblances in intelligence are not only op- 
erative at early ages but also continue to have an influ- 
ence in maintaining that resemblance, or else new factors 
having the same effect are present. 
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Chapter V 
SEX DIFFERENCES 


It is not our purpose to discuss at length the im- 
portance of sex differences, nor shall we attempt an = 
planation of, or rationalization for, the interest of psy 
chologists in the problem. One might, by some specula- 
tion, arrive at the notion that the reason for a section 
on this topic in so many research reports can be sub- 
sumed under one of the following three headings: a real 
interest in sex differences per se, or an interest inci- 
dental to the problem of uncontrolled variables in experi- 
mental work, or a mere following of the tradition of in- 
cluding a section devoted to sex differences. All three 
types of motivation have contributed to our knowledge 
concerning differences due to Sex. Whether or not such 
differences as have been demonstrated have social sig- 
nificance may be open to question, but as regards their 
import for experimental control there can be no question; 
for when sex differences are not 
Írom an experiment becom 
ject to qualifications. 

One who would constr: 
pacity has two possible m 
of sex differences. 


present, conclusions 
€ more general and less sub- 


uct a test of intellectual ca- 


they reflect sex differences 
To the extent that this assu 
justified in eliminating fro 
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which yield large sex differences, and by this method 
may be able to dispense with separate norms. 

There are, of course, limits beyond which one 
would hesitate to go in defense of either of these meth- 
ods. For example, the person inclined to favor separate 
norms for the sexes would nevertheless avoid using test 
items obviously unfair to one sex or to the other, such 
as making a dress or constructing a kite. On the other 
hand, one who favors the second procedure must admit 
that it rests upon an assumption, and that in the case of 
certain test items the assumption may be in error, Cer- 
tainly the absence of sex differences cannot be proved 
by the simple expedient of refusing to use items which 
show such differences! 

The authors of the New Revision have chosen the 
second of these alternatives and have sought to avoid 
using test items showing large sex differences in per 
cents passing. Their choice rests upon the empirical 
fact that test batteries of extensive scope and varied con- 
tent as a rule yield only small sex differences in total 
Scores, and that when individual test items do show large 
sex differences these can often be accounted for in terms 
of known differences in environment or training. How- 
ever, because of the limited number of test items avail- 
able at a given age level it was not possible to eliminate 
all of the items which showed sex differences. The ex- 
tent to which sex-differentiating items were balanced will 
become evident in the ensuing section of this chapter. 
Subsequently, information will be given on those retained 
and eliminated items which yielded statistically signifi- 
cant sex differences. 


Differences in LQ. or Total Score 


The success of the aim to produce a scale which 
will yield comparable I.Q.’s for the sexes may be seen 
by examining Table 12. The data in this table are based 
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TABLE 12 


SEX DIFFERENCES IN I.Q. 
(Composite of Forms L and M) 


N 
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Means 
M Е 
105.5 108.0 
105.7 115.5 
107.9 104.6 
106.6 114.0 
105.1 105.4 
103.7 106.3 
103.2 107.2 
101.6 101.2 
102.0 100.5 
103.4 101.2 
104.3 101.8 
106.1 103.2 
104.2 103.8 
104.4 104.1 
103.4 102.6 
104.9 102.0 
101.4 100.0 
106.0 99.3 
101.0 101.2 
106.6 103.6 
106.4 108.3 
104.8 107.8 
103.8 1021 
105.2 103.0 


$.D.'s 


M 


17.1 
17.3 
16.6 
17.3 
13.5 
15,2 
14.5 
13.4 
13.2 
15.3 
14,4 
16.2 
15.4 
17.2 
20.7 
16.8 
16.6 
19.9 
17.1 
14.3 
17.7 
15.8 
16.4 
17.4 


Е 


12.6 
20.8 
20.1 
14.1 
17.9 
15.4 
12.7 
14.3 
11.6 
16.1 
16.2 
15.8 
16.3 
17.9 
18.6 
18.5 
15.9 
17.9 
17.4 
13.1 
15.5 
16.9 
16.5 
16.4 
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upon L-M composite scores obtained by the final scoring 
procedures. The nearest approach to statistically signif- 
icant differences between means will be found at ages 
2-1/2 and 3-1/2 where the critical ratios are 2.6 and 
2.4 respectively. These 10- and 8-point differences lose 
some significance because of a 3-point reversal at age 3. 
When ages 2 to 5-1/2 are combined (see bottom of table), 
the difference between the sex means is 3 points, which 
is 2.6 times its standard error. This suggests the like- 
lihood that the scales are favorable to slightly higher 
Scoring for girls at these early ages. For later ages 
the means for boys are rather consistently higher than 
those for girls. The average difference for ages 6 to 18 
combined (1121 cases of each sex) is about 1.8, with a 
standard error of about .7, from which we cannot con- 
clude that there is or is not a real sex difference in 
measured LQ.'s at these age levels. We сап be fairly 
confident, however, that the true difference is reasonably 
small, and consequently an obtained 1.0. need not be cir- 
cumscribed because of sex. That intellect can be defined 
and measured in such a manner as to make either sex 
appear superior will become apparent in the next section. 


Sex Differences by Items 


One of the sources of conflicting data regarding 
Sex differences in mental ability must be attributed to 
differences in tests. Certainly, tests which bear the same 
label are apt to be quite different as to content and as 
to the kind of ability called for. One does not have to 
pursue this line of thought very far to realize that all 
the issues revolving around the organization of abilities 
are here involved. The problem would, perhaps, be 
greatly clarified if the factor analysts should succeed 
in two things: first, the isolating and unequivocal tag- 
ging of abilities; and second, the constructing of tests 
by which such 'pure' traits or abilities can be measured 
independently. 45 
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With our present state of knowledge (mostly ignor- 
ance) concerning just what abilities exist and which test 
measures what, it тау be more profitable to study sex 
differences via the individual items. Such a procedure 
will overcome one of the chief limitations to the study of 
Sex differences by way of scores based upon a compos- 
ite of items, namely the likelihood that such differences 
as emerge may be due to a particular ciuster of items 
which call for some factor or ability unknown to or un- 
recognized by the investigator. There is also the pos- 
Sibility that а composite (aggregation of items) may be 
So balanced, either by design or accident, as to mask 
real differences. Now, both an item and a composite 
will likely call for а complex of abilities, but the factor- 
ial composition of a single item should be simpler than 
that of a composite unless the latter has been construct- 
ed as a ‘pure’ measure of some one ability or unless it 
consists of an aggregation of highly similar items such 
as, for example, the verbal analogies subtest of the typi- 
cal group scale of intelligence. A knowledge of just 
what types of item Situations reflect sex differences 


should also make it easier to theorize as to possible 
causes of the differences, 


own arbitrary, rational, albeit 
In either case, gen- 
to word magic, but 
pletely, any reader 
erence he chooses: 
word magic. 

ex differences must 
d failing each item. 


The analysis of item data for 5 
depend upon the per cent Passing an 
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The revision data are such that a comparison for a singlé 
item can be made for several different age groups; thus 
for each item one could compute several critical ratios 
for the sex difference in per cent passing. It would seem 
desirable to have a single index for the statistical reli- 
ability of the differences on an item. Obviously, one 
cannot justifiably combine ages so as to have a single 
per cent for each sex, nor can one average the several 
age per cents. A suitable single index can be obtained 
by computing, for each age, chi square from the four- 
fold table formed by classifying passing and failing by 
Sex, then summing the chi squares for the several ages. 
The number of degrees of freedom will equal the num- 
ber of chi squares summed, i.e. the number of ages for 
which separate values of chi square are determined. The 
number of age levels usable for a given item will depend 
upon the spread of failing and passing for the item, but 
the extremes cannot be used because of the inapplicabil- 
ity of chi square when any one expected frequency in the 
fourfold table is small. 


TABLE 13 


EXAMPLE OF CHI SQUARE AS APPLIED TO TESTING THE SIGNIFICANCE 
OF SEX DIFFERENCES 


Age 6 1 8 


в [18 84| 102 и 66 | 102 [44 58] 102 [66 37] 103 


20 30] 100 |39 62| 101 |49 52| 101 


G 8 93| 101 


26 177 203 56 146 202 83 120 203 115 89 204 


43 5.02 
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As an example of the use of the chi square tech- 
nique, let us look at Table 13, wherein will be found the 
frequency of passing (+) and failing (-) for each sex by 
separate ages for the item ‘orientation: form.’ Under 
each fourfold table will be found the chi square for that 
table. Taken singly no one chi square is significant (it 
will be recalled that a chi square based on one degree of 
freedom corresponds to the square of a critical ratio), 
but all four differences are in the same direction, so one 
need not be surprised at the significant x? probability of 
-0036, which the reader will recognize as being near the 
P yielded by a critical ratio of 3. 

The use of chi square does not eliminate the ques- 
tion as to what shall be accepted as a critical value for 
Significance versus insignificance. А of .05 may be 
sufficiently low to suggest non-chance differences, but а 
smaller value вһошд һе demanded before one сап be very 
sure of the reality of the obtained differences, We shall 
report here only on those items which yield P values of 
-01 or less, Smaller values ої P will tend to give us still 
greater confidence for concluding that a real sex differ- 
ence exists. The chi square technique, like the ordinary 
critical ratio method, does not yield a measure of the de- 
gree of association, and consequently it cannot be inferred 
from two chi square P values that the association is ne- 
cessarily stronger for the item yielding the smaller P. 

The essential data on item sex differences are 
summarized in Tables 14 and 15. These tables include 
items which were eliminated and items which were re- 
tained. The locations of the latter in the final forms are 
indicated. The items have been arranged according to 
the age groups utilized in the sex comparisons, and these 
ages have been specified in the tables. 


The reader will 
recall that half-age groups, 


2-1/2, 3-1/2, 4-1/2, and 
5-1/2, were tested, and that there are approximately 50 


of each sex in age groups 2 to 5-1/2 and 15 to 18, and 
about 100 of each sex in age groups 6 to 14, More ex- 
act N's for the data on any one item may be obtained by 
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combining the information in Tables 14 and 15 regarding 
ages used and the several age-sex N’s given in Table 12. 

A word is in order regarding the varying ranges 
of ages used. The requirement that expected frequencies, 
for passing or failing, be 10 or greater constitutes the 
principal restriction for usable ages. For a few items, 
as for example the sample item in Table 13, an addition- 
al restriction was involved which had to do with the 
faulty location of the items in the tryout forms, These 
restrictions will not tend to enhance or diminish sex dif- 
ferences, but will heighten our confidence in the adequacy 
of the data used in the comparisons. The two ‘plan of 
Search’ items constitute the only exceptions to the fore- 
going procedure. For both these items, there were very 
minor sex differences prior to age 13, then marked and 
consistent differences from 13 to 18; hence we report 
data only for these later ages, and consequently any con- 
clusion must be modified accordingly. Incidentally, these 
two items and two of the ‘copying a bead chain from 
memory’ items were the only items of all those in the 
provisional tryout forms which seemed to suggest the 
emergence of a sex difference with age. In all other 
cases the differences were either consistent and signifi- 
cant or inconsistent and insignificant as regards the com- 


parisons at the several ages. 
It should also be noted that data for a recurring 


test are presented for the test only at one рава level 
unless it is possible, as in the case of one test — ‘огіеп- 
tation: form’ — to make comparisons for a different set 
of age groups. For instance, test L, XIV, 4, ‘ingenuity,’ 
which is scored as 1 plus at this level and for which sex 
comparisons can be made for ages 11 to 17, recurs as 
L, А.А., 6 with score as 2 plus, but the possible sex 
comparisons would involve ages 12 to 18. Obviously, the 
presentation of such duplicative data is not only unneces- 
sary but actually questionable. AS а matter of record, it 
can be stated that when a recurring test showed a sig- 
nificant sex difference for one passing standard, the dif- 
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TABLE 14 
ITEMS ON WHICH GIRLS SURPASS BOYS 


Location Name Ages xc, P 
Picture memories 2 -3 -0046 
Counting (reciting 
numerals) 23-4 -0007 
Paper folding: square 21-4 .00024 
Buttoning 3 -4 .0013 
L, IV-6,1 Aesthetic comparison 3 -5 .00018 
Aesthetic comparison 35-53 .0014 
М, Ш-6,а Matching objects 32-5 .010 
L,V,2 Paper folding: 
ы pss j 4 -5 .0010 
Tying a bow knot 53-8 .000014 
Age discrimination 8 -12 .0006 
M, XI, 2 Copying a bead chain 
from memory 7-17 .0065 
L, XII, 6 Minkus completion 10 -17 .0010 
М, A.A., 4 Codes I 10 -18 .0065 


ference was also present 

We are now ready 
as exist for single items, 
and 15 that there are ap 
boys surpass girls than 
School levels, 


for all passing standards. . 
to discuss such sex differences 


First we note from Tables 14 
Parently more items on which 


ages and of boys 
But the greater number of 
Parent than real, Actually 


Superior. This repetition of 


forms for certain items, by which 25 items are reduced 
to 14 item situations is important as regards the study 
of sex differences per Se, but as regards the New Re- 
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vision as a measuring instrument the inclusion of variant 
forms of items which yield such sex differences may be 
regarded as unfortunate insofar as their presence has not 
been entirely balanced by items favorable to the opposite 
Sex (see Table 12). 

The interpretation of the findings summarized in 
Tables 14 and 15 is fraught with difficulties. Most of the 
Superiority for girls occurs at the lower age levels on 
items which seem to involve some type of manipulative 
ability (‘buttoning,’ ‘tying knot’) or discrimination (‘aes- 
thetic comparison,’ ‘picture memories’) or number fa- 
cility (‘counting,’ ‘matching objects’). The direction of 
the difference for ‘buttoning,’ ‘tying knot,’ and ‘aesthetic 
Comparison’ (‘Which one is prettier?’) is plausible, but 
the differences on the other items at these lower ages 
are baffling. 

The superiority of girls on ‘age discrimination’ 
(from pictures) fits in with the notion that girls are more 
interested than boys in people and social matters. The 
difference on ‘copying a bead chain from memory’ is in 
reality small - a comparison on about 950 of each sex 
is necessary to produce the not so significant P of .0065. 
The superiority of the girls on this item is somewhat 
greater and more significant for ages 11 to 17 than for 
7 to 17; there is some indication of emergence at age 11, 
a trend also present for the parallel item in Form L 
СШ, 6) which yields non-significant differences in the 
Same direction аз М, XI, 2. The similar and easier 
item at L, VI, 2 shows no sex difference; neither does 
the non-memory ‘copying of a bead chain’ (M, VI, 2). 
The meaning of the superiority of the girls on 'Minkus 
completion’ is open to question since the parallel item 
on Form M shows only a slight difference for the same 
age samplings. Likewise, one wonders what psychologi- 
cal significance can be attached to the difference on 
‘codes’ since the data for the other three code items, 
two retained and one not retained in the final forms, 
Show an utter lack of sex difference. 
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We have remarked earlier that a part of the in- 
consistencies among the data on sex differences is due 
to tests with the same label possibly being measures of 
different abilities, We now see that highly similar items 
do not yield consistent sex differences. We are unable 
to see any reason why a difference should emerge on this 
particular ‘Minkus completion,’ or ‘codes,’ or ‘bead chain 
item, and not on parallel items. 

Let us now turn to the items upon which boys tend 
to excel (Table 15), The most striking fact shown in this 
table is the marked difference, highly significant and con- 
sistent with age, on ‘picture absurdities.’ Тваё absurdity 
per se is not the sex-differentiating factor is evident 
from the absence of a difference for ‘verbal absurdities. 
The results for tests involving ‘orientation’ are also 
striking and perhaps not so surprising. И should be ге- 
ported that ‘orientation: direction II’ on Form M, with 
frequencies too low for adequate statistical treatment, 
exhibits the same trend, It may be that boys are supe- 
rior in space ability, and that the ‘orientation’ items, 
along with ‘block counting’ (from pictures) and ‘plan of 
search,’ depend somewhat on such an ability, The ‘sub- 
stitution’ item as here used might also involve space. 
It might have been anticipated that the ‘induction,’ ‘arith- 
metical reasoning,’ and ‘ingenuity’ items, and certainly 
‘word naming: vehicles,’ would favor the males. 

The three items of Table 15 not yet discussed are 
of especial interest in that parallel, 
fail to show similar differences, Of the 7 ‘opposite an- 
alogies’ items, only one shows a sex difference. It con- 
tains the sub-items; ‘the rabbit’s ears are long, the 
rat’s are ; Snow is white, coal is ; 
the dog has hair, the bird has ; wolves are 
wild, dogs are > Of 7 ‘comprehension’ items, 
only one yields a significant difference. One of its three 
questions is ‘What makes a sailboat move?’ of 4 sets 
of ‘abstract words,’ the one upon Which b 


requires definition for the words ‘connie 
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TABLE 15 


ITEMS ON WHICH BOYS SURPASS GIRLS 
Ages 


Location 
L, VU, 1 
M, VII, 3 


L, VIII, 5 


М, Х, 1 


L, XI, 3 

L, XIV, 5 
L, XIV, 4 
L, XIV, 2 
L, A.A., 4 


M, XIV, 5 


L, XIII, 1 
M, XIII, 1 


Name 


Picture absurdities 
Picture absurdities I 
Picture absurdities I 
Orientation: form 1(1+) 
Comprehension IV 
Opposite analogies 
Block counting 
Orientation: form II 
Picture absurdities II 
Picture absurdities II 
Picture absurdities III 
Word naming: vehicles 
Orientation: direction I 
Memory for stories 
II: acrobat 
Orientation: form I (2+) 
Abstract words I 
Orientation: direction I 
Ingenuity 
Induction 
Arithmetical 
reasoning 
Ingenuity 
Substitution 
Plan of search 
Plan of search 


L,S.A. III, 2 Orientation: 


direction II 
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-6 
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6 
6 


7 
T 
T 
9 
9 
9 
9 
9 
9 


-9 
49 

-10 
-12 
-12 
-14 
-14 
-15 
-17 
=17 
-17 


-18 
-18 
-15 
-17 
-17 
-18 


-18 
-17 
-18 
-18 
-18 


-18 


ыр 


.0026 
<.000001 
.0010 
.0036 
.00026 
.0010 
.0007 
.00011 
<.000001 
.00002 
<.000001 
<.000001 
.00005 


.0002 
00001 
.0070 
.000003 
.000001 
.000003 


.0092 
.0016 
.00008 
.0082 
.0010 


.0040 
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‘conquer,’ ‘obedience,’ and ‘revenge.’ It happens that а 
parallel item containing the words ‘pity,’ ‘curiosity,’ 
‘grief,’ and ‘surprise’ shows slight, though consistent 
from age to age, differences in favor of girls. 

The results for these three types of items (‘com- 
prehension,’ ‘abstract words,’ and ‘opposite analogies’) 
plus similar inconsistencies for some items showing fe- 
male superiority, tend to point to one conclusion: name- 
ly, that sex differences are apt to be a function of the 
content of an item rather than of any basic abilities 
called for by the item. In other words, observed sex 
differences in scores may be either highly specific to an 


item situation or a reflection of a real difference in a- 
bility. 


Chapter VI 
DATA ON RELIABILITY 


It is the purpose of this chapter to set forth more 
information on the reliability of the scales than was 
feasible in the previous volume. As stated therein, the 
ordinary form versus form coefficient of reliability is in- 
applicable to 1.0. data because of an apparent lack of 
homoscedasticity in the scatter plots, This tendency for 
the scatters to be fan-shaped was evident in nearly all 
the scatter diagrams, so that it was assumed to be a 
real, non-chance phenomenon. This was taken to indicate 
that the reliability of an I.Q. score is a function of the 
magnitude of the 1.9. itself, and in order to determine 
the error of measurement associated with a given 1.0., 
resort was made to estimation by way of the average 
difference between LQ.'s derived from Form L and Form 
M. The resulting reliability data reported in the pre- 
vious volume were confined to large I.Q. groupings for 
ages 3 to 18 combined. 

More specifically, we propose in this chapter (1) 
to present evidence to show that the lack of homosce- 
dasticity is non-chance; (2) to break down the data into 
three age combinations with smaller 1.9. groupings; (3) 
to supplement the average-difference method, which as- 
Sumes normality of distribution for the errors of meas- 
urement, by another scheme which is not subject to this 
limitation; (4) to state more fully why we believe that the 
observed facts regarding the dependence of the errors of 
Measurement upon the magnitude of the LQ. might have 
been anticipated on logical grounds; and (5) to provide 
information on the accuracy of mental age scores. 

When the several reliability scatter plots were 
made for the 21 age groups, a tendency toward a fan 
type of distribution was noticeable in 17 of the plots. In 


55 


DATA ON RELIABILITY 


order to get a statistical measure of this cog pot 
ticity, the differences between Form L and Form M : Ww 
were plotted against the composite I.Q.’s, obtaine №. 
combining results from the two forms, The compu d 
coefficients of correlation between I.Q. differences ies 
LQ. magnitude were positive in 18 instances, and ^ 
from -.103 to .240 with a median value of about > s 
Only one of the separate values could be deemed = 
nificantly different from zero, but the consistently ad 
tive coefficients from independent samples a 
Something more than chance is operating пеге. I: 
ages 2-1/2 to 18 are combined, the correlation is um 
with a standard error (by the orthodox formula) of . 4 
This coefficient is 7-1/2 times its sampling error 55 
therefore definitely greater than zero, In fact, we с 


is 
be reasonably sure that the universe value for г 
greater than .08. 


TABLE 16 


REGRESSION OF LQ. DIFFERENCES, y, 
ON COMPOSITE LQ., x 
b 
Age N м, А м. су Тху ух 
2-1/2 -5-1/2 728 106.27 15.93 6.00 4.78 .108 p: 
6-13 1623 103.69 16.60 5.03 3.92 .131 .19 
14 - 18 619 102.97 16.77 4.47 3.50 .148 .205 


16.52 5.15 4.10 .135 .156 


year-old children cannot fal] below 75, a factor which 
tends to limit the difference between Form L and Form 
М LQ.'s for those who are retarded.) Despite the 
smaller N's for the three age groups, the three corre- 
considered chance deviations 
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from zero, These facts tend to substantiate the conclu- 
Sion that the lack of homoscedasticity in the original 
reliability plots is statistically significant. That such 
Small correlations as .135 can possess more than sta- 
tistical significance will become apparent and real in the 
pages to follow. 

An examination of the four scatter diagrams from 
which the data of Table 16 were derived indicated that 
the regression of I.Q. differences on LQ. magnitude might 
not be linear; so etas for differences on 1.0. were com- 
puted. In order to test for lack of linearity, Fisher's! 
analysis of variance method was used. For the 2-1/2 to 
5-1/2 group there is no evidence of real curvilinearity; 
for ages 6 to 13, Fisher’s test indicates that the proba- 
bility of as great a departure from linearity is less than 
.001, and accordingly an assumption of linearity is not 
tenable; for ages 14 to 18, the observed discrepancy 
from linearity would arise by chance 5 times out of 100, 
which suggests a real departure from linearity. For ages 
2-1/2 to 18 combined the probability of as great a devi- 
ation from linearity is .01. In view of these probability 
figures it does not seem safe to assume that the rela- 
tionship between LQ. size and 1.9. differences, i.e. 1.0. 
accuracy, is linear. The curvilinear trends will be de- 
scribed subsequently and factors which may possibly ex- 
plain the lack of linearity will be suggested. 

It is convenient at this time to restate why we 
think it very logical that the standard error of measure- 
ment for LQ.'s should vary in such a way that greater 
accuracy is associated with lower than with higher LQ.'s. 
Suppose we have an individual whose M.A. is 100 months 
and we accept 4.42 months as the standard error of 
measurement to be associated with a mental age of 100 
months. If the individual’s С.А. were 100 months, his 
LQ. would be 100 + 4.42; if his С.А. were 80, his LQ. 


1 
R. A. Fisher, Statistical Methods for Research Workers (6th 
ed.), pp. 257-261. Edinburgh: Oliver and Rovd, 1936. 


57 


DATA ON RELIABILITY 


would be 125 + 5.52; and if his С.А. were 125, his LQ. 
would be 75 + 3.54. (These three standard errors of 
measurement for the I.Q.'s follow from the principle that 
if a measure is transformed by division, its error must 
also be likewise transformed.) Now suppose that three 
individuals have mental ages of 96, 120, and 144 months. 
It seems reasonable to associate with each of these 
three mental ages that error of measurement which is 
found for individuals who usually score at these three 
levels, The proper errors would therefore be the er- 
rors found for individuals of С.А. 96, 120, and 144 ге- 
spectively, and even though the ‘reliability’ coefficients 
were the same for these three age groups, the errors of 
measurement for M.A.’s would be different because of 
the increasing variability of the M.A. distributions as we 
pass from С.А. group 96 to 120 to 144. The respective 
observed errors in months are approximately 4.4, 5.2, 
and 5.9. If the C.A.’s for all three individuals were 120, 
their LQ.'s would be 70 + 3.7, 100 + 4.3, and 120 + 4.9. 
It might be argued that it is proper to consider only 5.2 
as the error of M.A, determination for individuals of C.A. 
120, but this would lead us astray since the reliability 
scatter plot of mental ages for a single chronological age 
shows the same fan-shape as the I.Q. scatter plot. 
Perhaps an analogy will help clarify the issue. 
Suppose we were measuring the heights of individuals 
with a one-inch ruler, and that it is shown empirically 
that the shorter are measured more accurately than the 
taller children. No one could object to attaching to a 
child’s height, regardless of his age, that error which 
is usually associated with like heights. If the given 
height in inches were transformed to some index of 
height, the error in inches must also be transformed in- 
to index units. Now, the increase in scatter of M.A. 
distributions with age obviously means that the error of 
measurement for M.A.’s also increases, and therefore 
the higher the M.A. score, the larger its absolute error. 
Our argument is predicated upon the idea that the error 
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of measurement for a given M.A. is not entirely a func- 
tion of the age of the child but is a function of his level 
of performance in terms of mental age. 

Let us now consider the two methods used to ob- 
tain the reliability of LQ. scores. In view of an appre- 
ciable difference in reliability as we pass from the pre- 
School to the higher age levels, we have broken down the 
total into three age combinations: 2-1/2 to 5-1/2, 6 to 
13, and 14 to 18. The grouping together of several ages 
is necessary, as will soon become apparent, in order to 
have a sufficient number of cases for determining ade- 
quately the accuracy of LQ.'s between, for example, 70 
and 80; but the particular age combinations which we 
have used are admittedly arbitrary. 

The first method of ascertaining reliability in- 
volved computing the average difference between Form L 
and Form М LQ.'s for those individuals whose compos- 
ite LQ.'s fall in a given interval, e.g. 70 - 79, and then 
multiplying this mean difference by à v п to obtain 
the standard error of measurement, с, from which the 
equivalent reliability coefficient is obtained via the re- 
lationship r,;- 1 - 05/0? where о? is the variance for 
the distribution of 1.9.’5. This method is based on the 
assumption that the errors of measurement are normally 
distributed, and that the two observed LQ.'s for each of 
several individuals having the same ‘true’ 1.0. can be 
considered as though drawn by chance from such a norm- 
al distribution with unknown, but to be determined, vari- 
ance, ee. It has been shown ' that when the standard de- 
viation for a variable is known and pairs of scores are 
drawn at random, the expected average difference be- 
tween pairs will be (2 + УП ) times 0, In the present 
problem, we have an observed average difference from 
which to estimate c; That this method of estimating a 
reliability coefficient is really satisfactory has been 


1 
Q. McNemar, "The Expected Average Difference Between Indiv- 
iduals Paired at Random," J.Genet. Psychol., 1933,43, 438-439. 
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checked empirically by computing the ordinary form 
versus form reliability coefficients from 24 scatter plots 
(varying N’s), and then recomputing the coefficients via 
the 24 mean differences between pairs of scores in a set 
(no grouping on the basis of composite scores involved). 
The average discrepancy between the coefficients obtained 
by the two methods was .005, the maximum discrepancy 
was .01, and there was no evidence for a directional bi- 
as. We conclude, therefore, that the average difference 
method is defensible even though based upon an assump- 
tion which may not be tenable, 

In applying this method, those individuals whose 
composite LQ.'s were 69 or less were grouped together, 
those above 140 were treated as a group, and the rest 
were grouped into intervals of 10 LQ. points as 70 - 79, 
80 - 89, etc. The average difference between L and M 
LQ.'s was computed for each 1.0. group, and from these 
averages the equivalent reliability coefficients, as given 
in the left-hand column of Tables 17, 18, and 19, were 
estimated, A standard deviation of 16.6 has been used 
in all cases, and therefore these ‘reliabilities’ are com- 
parable as far as range is concerned, This cof 16.6 is 
the average of the two sigmas for the L and М 1.9. dis- 
tributions for 2970 cases, ages 2-1/2 to 18. Had the 
value of 16.4 (used in the previous report) been employed 
again, the herein given coefficients would be on the av- 
erage about .002 smaller, Further discussion of the re- 
Sults in Tables 17-19 will be postponed until the other 
method of determining the reliabilities has been described. 

This second method depends essentially on deter- 
mining, by direct computation of array variances, how 
accurately 1.0,75 on one form can be predicted from LQ.'s 
on the other form, and then asking what the equivalent 
correlation (in this case, reliability coefficient) must be 
for such an error of estimate or array variance. Scat- 
ter plots were made with an interval of size 2 (small so 
as to avoid grouping error) for Form 1, LQ.'s as ordi- 
nate and an interval of size 5 for М L.Q.'s as abscissa. 
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From these plots (again age groups must be combined 
so as not to have too few cases per array) the variances 
for the several vertical arrays about their respective 
means were computed. This yielded a separate variance 
for each of the 60 - 64, 65 - 69, etc. arrays. Scatter 
plots were also made with the axes reversed so that Form 
M I.Q.'s were along the y-axis, intervals of size 2, with 
the L values along the x-axis, 5-point intervals. From 
these plots a second set of array variances was obtained 
by the procedure just described. Thus were provided two 
variances for the 60 - 64 array, two for the 65 - 69 ar- 
ray, and so on, but the two variances differed by chance, 
as did also the number of cases in the two correspond- 
ing arrays. It would seem reasonable to average the 
two variances as a better estimate for a given array. 

The procedure followed, however, was the averag- 
ing of four variances for, say, the 60 - 64 and 65 - 69 
intervals or arrays so as to obtain an estimate of the 
variance for the 60 - 69 array. This final variance is a 
weighted average and is therefore equivalent to combining 
sums of squares of deviations from the four respective 
array means and then dividing by the total number of 
cases in the four arrays. This apparently roundabout 
procedure for obtaining the variances for the 60 - 69, 
70 - 79, etc. arrays was employed in preference to the 
usual method (i.e. combining array distributions) in order 
to reduce somewhat the effect of regression within an 
array. This tendency for regression to increase the ar- 
ray variances has not, however, been completely elimin- 
ated; the computed, and therefore average, variances will 
be slightly exaggerated, and accordingly the estimated 
equivalent reliability coefficients will possess a small 
negative bias. 

As has already been indicated, each of the sev- 
eral array variances will correspond closely to the square 
of the standard error of estimate in predicting LQ.'s on 
one form from LQ.'s on the other. The fact that these 
variances differ systematically as we pass from array to 
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array was expected because of the apparent heterosce- 
dasticity of the scatter plots. The estimated equivalent 
reliability coefficient is easily determined from the re- 
lationship г? = 1 - са? / о? where oz? equals the given 
array variance and с is 16.6. The estimates obtained by 
this method will be found in Tables 17-19, and the re- 
Spective N’s are given in two columns since the intervals, 
60 - 69, etc. on the Form M axis will ordinarily not 
contain the same number of cases as the corresponding 
intervals on the Form L axis, It should also be noted 
that the total N's in these two columns need not agree 
mutually or with the total М for the average-difference 
method. This is due to the fact that the terminal inter- 
vals for the variance method were 60 - 69, and 140 - 149, 
whereas for the difference method these were 69 down 
and 140 up. The difference method utilizes all the cases, 
whereas in the variance method no attempt was made to 
use arrays below 60 or above 150 because of too few 


cases. These end intervals might have been handled 
otherwise, with a resulting negligible change in final val- 
ues, 


TABLE 17 
RELIABILITIES FOR AGES 2-1/2 to 5-1/2 


Via av, diffs, Via variances Ау, Smoothed 


LQ. MOA mN My ty тү ж 
140 - 149 850 15 тов 11 11 ,823 „834 6.8 
130 - 139  ,801 29 912 41 33 .856 ,849 6.5 
120 - 129 .870 100 .866 вр 96 .868  .874 5.9 
110 - 119 .909 160 (888 175 168 .898 .890 5.5 
100 - 109 „900 198 (908 191 184 .904 (899 5.3 
90- 99 .898 125 .892 108 126 |898 909 5.0 
80 - 89 .930 61 „924 76 63 .927  .g14 4,9 
70- 79  .926 29 .916 20 28 .921 .919 4,7 
60- 69 .909 11 .911 7 4 .910 „914 4.9 
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Via ау. diffs. 


10. 


140 - 149 
130 - 139 
120-129 
110 - 119 
100 - 109 
90 - 99 
80 - 89 
70 - 79 
60 - 69 


"I 


TABLE 18 
RELIABILITIES FOR AGES 6 to 13 


Via variances Av. 


Tr М Nw fir 
.903 29 32 .924 
.896 73 57 .874 
.890 139 171 .897 
.921 309 316 .921 
.922 407 400 .930 
.930 333 341 .928 
.943 216 196 .938 
.950 76 68 .956 
.972 30 28 .979 
TABLE 19 


RELIABILITIES FOR AGES 14 to 


Via ау. diffs. 


LQ. 


140 - 149 


гү] 


.942 
-950 
-930 
+928 
.925 
.948 
.966 
.959 
.992 


N 


8 
24 
73 


Via variances Av. 
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NL 


NM TII 
11 .953 
30 .938 
63 .925 
126 .928 
134 .922 
124 .949 
79 .961 
34 .960 
16 .988 


Smoothed 


ү 


.907 
.898 
.897 
.916 
.926 
.932 
.941 
.958 
.971 


18 


5.1 
5,3 
5,3 
4,8 
4,5 
4,4 
4.0 
3.4 
2.8 


Smoothed 


Fil 


+948 
.939 
.930 
.925 
.933 
.944 
.957 
.970 
.979 


Te 


3.8 
4.1 
4.4 
4.5 
4.3 
3.9 
3.4 
2.9 
2.4 
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A comparison of the equivalent reliability coeffi- 
cients (see Tables 17-19) reveals a median discrepancy 
of .01 between the values obtained by the two methods. 
We have no way of evaluating these differences statisti- 
cally, but their magnitudes are such that it does not 
seem unreasonable to attribute the discrepancies to 
chance, There is one disturbing disagreement in Table 
17, for the 130 - 139 level, for which, if non-chance, we 
have no explanation, It will also be noted that the intra- 
pair differences in each table are small relative to the 
range of coefficients; i.e. the two estimates tend to vary 
together. It would Seem that a combination of the val- 
ues obtained by the two methods would yield a more de- 
pendable estimate, hence the column in each table head- 
ed Ау. ni. These last values have been smoothed by 
the method of moving averages (over 3 observations) on 
the defensible assumption that the irregularities are part- 
ly due to chance. The Smoothed values, and the corre- 
Sponding standard errors of measurement, are given in 
the last two columns of Tables 17-19, 

We consider these final values to be reasonably 
accurate estimates of the equivalent reliabilities and of 
the errors of measurement, to be sufficiently refined for 
all practical purposes, and to be decidedly superior to 
any Computed reliabilities which ignore the lack of homo- 
Scedasticity in the Scatter plots, Personal preference 
and utility must be the deciding factors as to whether a 


given 1.0. level relative to 
the total spread of 1.Q.’s, They are not to be erroneous- 


ly interpreted as giving the accuracy relative to a narrow 
range of LQ.'s such as 80 - 89, 


Plots of the smoothed reliability coefficients a- 
gainst LQ. magnitude, one for each of the three age 
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groups, reveal apparently curvilinear relationships. The 
same holds true for the standard errors of measurement 
versus 1.0. level, as can be seen in Figure 3. On the 
basis of our argument that the errors of measurement for 
1.Q.’s should vary with 1.9. magnitudes because of in- 
creasing variability in M.A. distributions with age, we 
should anticipate a linear relationship between the Ce's 
and LQ. magnitude. This assumes that the reliability 
of the 1.0. scale does not vary with age and that the 
Spread (standard deviation) of М.А. distributions increases 
with age at a constant rate (.166 times С.А,; ie. this 
assumption implies that our computed value, namely 16.6, 
for the с of the LQ. distribution holds true for all ages). 
The three straight lines in the figure indicate the pre- 
sumed relationships. 


84 


65 75 85 95 105 15 ijs 135 145 


Fig. 3. Observed and expected (dotted) values for 
Oo (based on Table 20) 
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Let us now compare our final errors of measure- 
ment for LQ.'s with those that one would expect to arise 
solely on the basis of the increase with age of the var- 
jability of М.А. distributions. If, for example, we choose 
the 6 to13 group and if we accept .929 as the equivalent 
reliability coefficient and 4.42 as the corresponding 
Standard error of measurement for 1.0,75 of 100, we can 
obtain the expected standard error for any given LQ. by 
multiplying it by 4.42. Thus for an LQ. of 125 we should 
expect the error to be 5.5 as compared with an observed 
value of 5.3; the expected value for an LQ. of 75 would 
be 3.3, whereas the observed value is 3.4. These and 
the other values for ages 6 to 13 have been set forth in 
Table 20, in which will also be found corresponding fig- 
ures for the two other age groups. The expected values 
for the iowest age group are based оп a c, for 1.9. = 100 
of 5.14, while for the 14 to 18 group, 4.10 was used. 


TABLE 20 


COMPARISON OF OBSERVED (0) WITH EXPECTED (E) 
ERRORS OF MEASUREMENT 


21-52 6 - 13 14 - 18 

19. о Е о Е о Е 
140 - 149 6.8 7,4 5.1 6,4 3.8 5.9 
130 - 139 6.5 6.9 5.3 6.0 4.1 5.5 
120 - 129 5.9 6.4 5.3 5.5 4.4 5.1 
110 - 119 5.5 5.9 48 5.1 4.5 4T 
100 - 109 5.3 5.4 45 4.6 4.3 4.3 
90- 99 5.0 4.9 44 42 3.9 3.9 
80 - 89 4.9 44 4.0 3.8 3.4 3.5 
70 - 79 4.7 3.9 3.4 3.3 2.9 3.1 
60 - 69 49 3.3 2.8 2.9 2.4 2.7 
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It will be noted, from the figure and Table 20, 
that there is a fair agreement between the linear ex- 
pected and the observed values for ages 6 to 13, the 
group which contains the largest number of cases and for 
which the assumptions underlying the determination of the 
expected values are most nearly met. The failure; if non- 
chance, of the observed values for the 130 - 139 and 140 
- 149 levels to agree with the expected values may be 
due to the brighter 11-, 12-, and 13-year-olds having 
mental ages in those levels where there is no longer an 
increase in M.A. variability. When the two sets of val- 
ues for the 14to18 group are compared, we see that the 
agreement is close for I.Q. levels below 110, whereas for 
higher levels the accuracy does not decrease as expected, 
The explanation for this is that there is no increase in 
М.А. variability from ages 15 to 18, and therefore the 
factor which would, according to our reasoning, lead to 
a decrease is not here present. The agreement for ages 
2-1/2 to 5-1/2 is not particularly striking, but two fac- 
tors enter here which not only prevent concurrence but 
also act in such a way as to provide explanations for the 
direction of the discrepancies. One of these factors is 
the apparent, and we fear real, decrease in 1.9. spread 
as we pass from ages 2-1/2 to 5-1/2, This means that 
the variability of the M.A. distributions is not increas- 
ing as rapidly as assumed in determining the expected 
values for ce; hence the change in the observed ce's with 
LQ. as we proceed upward and downward from the 100 
LQ. level will be less than anticipated. (As a matter of 
fact, a revision of our expected values to allow for a 
smaller constant increase in М.А. spread definitely 
swings, i.e. reduces the slope of, the line of expected 
values so as to reduce the discrepancies), The other 
disturber operating in the 2-1/2 to 5-1/2 group is the 
fact that the scales are progressively more reliable (as 
judged by form versus form coefficients, adjusted for 
differences in LQ. range) as we pass from age 2-1/2 to 
5. This also operates so as to yield errors of measure- 
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ment smaller than expected for higher LQ.'s and larger 
than expected for lower LQ.'s. 

In view of the general marked agreement between 
expected and observed accuracy, and the very plausible 
explanations for such discrepancies as occur, we are in- 
clined to repeat with Still greater confidence the state- 
ment made in the previous volume to the effect that the 
dependence of 1.0. accuracy upon 1.0. magnitude is in- 
herent in the LQ. technique and Should have been antici- 
pated on Strictly logical grounds. 


' 


TABLE 21 


AVERAGE DEVIATIONS FOR TEST-RETEST OTIS 
LQ.'S AS FOUND BY HIRSCH 


LQ. levels N Av. dev. 
131 up 29 7.1 
116 - 130 55 5.5 
95 - 115 174 5.3 
94 down 85 4.3 


It is extremely likely that this factor accounts for 
the larger fluctuations of LQ.’s, in test-retest or con- 
Stancy studies of the 1916 Revision, for superior as com- 


OW-average individuals. That the 


He reports the aver- 
for four 1.0. groups, 


“Ө. accuracy for 1.0.78 
it is no longer necessary to 
За possibly greater irregu- 


1 
N.D.M. Hirsch, "An Experimental Stud 
Children over a Six-Year Period," - Psychol, Honogr., 
1930, 7, 487-547. 
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larity of growth for the superior or a supposed differen- 
tial operation of motivational or emotional factors. 

So far our discussion has pertained primarily to 
the errors of measurement for LQ.'s even though the 
real source of error is, of course, in the M.A. scores, 
but knowing either error we can secure an approximate 
value for the other by use of the general relationship 
Че (мА) = (CA)Te(1q. This would need to be modified for 
individuals of 14 years and over because of the substitute 
C.A. divisor. The standard errors of measurement for 
М.А. scores, as given in Table 22, have been ascertained, 
however, by two schemes which will be described briefly. 
By the first method, we obtained from the separate age 
scatter plots, for 1.Q. differences versus magnitude of 
LQ., a regression estimate of the difference in LQ. for 
those who score at age, i.e. have 1.0,75 equal to 100. This 
difference in LQ.'s times the C.A. gives the correspond- 
ing difference in M.A. score, and from this last differ- 
ence we get an estimate of the error for an М.А. Score 
by multiplying by i п . The second method involved 
plotting the actual differences between the mental-age 
Scores derived from the two forms against mental age, 
then computing an average difference for the several M. 
А. levels, 30-39, 40-49, etc., and again multiplying by 
the appropriate constant to obtain ео (мл). A freehand 
or graphic method of smoothing the results from these 
two methods yielded the values in Table 22. We believe 
the values so obtained to be satisfactory approximations 
for the errors for M.A.’s below 180 months. It would 
seem safe to accept 7.5 as a fair estimate of the stand- 
ard error of measurement for M.A.’s higher than 190 
months, but we are not sure some other value between 


7.0 and 8.0 is not just as correct, 
It should be noted that the values in Table 22 are 


г the particular М.А.’5 given and not for a group of 


fo 

values; e.g. 2.9 is the error for an М.А. score of 60 
months. Linear interpolation can be used for in-between 
values. Those who attempt to check back and forth from 


69 


DATA ON RELIABILITY 
TABLE 22 


STANDARD ERROR OF MEASUREMENT IN MONTHS 
FOR MENTAL AGE SCORES (APPROXIMATE) 


М.А. Te (мА) М.А. бе(мА) 
180 7.0 100 4.6 
170 6.5 90 4.2 
160 6.2 80 3.9 
150 6.0 70 3.5 
140 5.8 60 2.9 
130 5.5 50 2.5 
120 5.2 40 2.2 
110 4.9 30 1.9 


9е(19) © Œa) will find apparent discrepancies which 
are to be explained on the basis of the values for LQ.'s 
having been obtained from age combinations with no allow- 


combination; wher 


for M,A.’s will ac 
with age, 


ting may not lead to as 
nts would indicate, 
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SPREAD OF INDIVIDUAL PERFORMANCES ' 


During the standardization-testing period, it was 
noticed that the spread of performarice, i.e. the failing 
of tests below and the passing of tests above an individ- 
ual’s general mental level, seemed to show greater vari- 
ation than that generally observed for the 1916 Revision. 
It was thought that this greater spread might, at least 
in part, be due to the fact that certain test items were 
poorly located in the preliminary forms, but subsequent 
use of the final forms, both at Stanford and elsewhere, 
has revealed what seems to be a rather wide scatter of 
passes and failures. It is the purpose of this chapter 
to discuss, and to set forth some possible explanations 
for, this variability, and to present the results of an at- 
tempt to analyze the standardization data in such a man- 
ner as to check on certain hypotheses regarding its cause. 

That the performance of nearly all individuals 
must show some scattering of passes and failures is a 
foregone conclusion. This is evident when we recall that 
the scales are made up of items which yield per cent 
passing with age curves which are none too steep and 
items which are not perfectly intercorrelated. Insofar as 
items may measure a group function in addition to the 
general and specific abilities, and insofar as these abil- 
ities, group and specific, may develop at different rates 
within the individual, it follows that an individual is apt 
to fail items below or pass items above his general 1еу- 
el of performance. Another obvious reason for varia- 
bility is the relative unreliability of single items. It 
should also be noted that the spread of performance or 
‘the investigation reported herewith was financed in part by 


a grant from the Committee on Psychology and Anthropology of 
the National Research Council. 
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variability will increase from the lower to the higher 
age levels. Whether this increase in absolute spread is 
more apparent than real can only be answered when а 
satisfactory answer is given to the larger problem to 
which this is related, i.e. the problem of increasing in- 
dividual differences with age. The fact that the curves 
of per cent passing with age, though steep at the earlier 
ages, tend gradually to flatten as one approaches ma- 
turity ages, is sufficient cause for an increase in appar- 
ent individual variability with age, 

But these factors which we have just enumerated 
are not necessarily valid reasons for the observed great- 
er scatter on the new forms as compared with the 1916 
Scale. One possible explanation for the greater variation 
15 an artifact of certain differences between the old and 
new scales. The presence in the new scales of test 
items at age levels 11 and 13 and the additional items 
at the upper age levels is the one definite reason for an 
apparently greater spread for individuals of ages 9 to 15. 
Likewise the presence of tests at half-age levels at the 
lower end of the scale will result in an apparently great- 
er spread, It cannot be argued that the larger scatter, 
if it really is larger, on Forms L and M is due to faulty 
placement of items; nor can it be said that it is due to 
the inclusion of items which show a wider age range for 
passes and failures than the items in the 1916 Revision. 
Aside from the aforementioned location of items at ad- 
ditional age levels, we see no real reason for a greater 
spread of performance on the New Revision. The pres- 
ence of Spread, however, does call for study, so we are 
here reporting an attempt, none too fruitful, to investi- 
gate two aspects of the problem, 

First we have sought an answer in terms of the 
nature of the items to the question: Is the spread of 


passing and failing a function of the scale itself in that 
certain items, or sequences of items, 


abilities other than the general ability 
items? This cannot be answered by rec 


12 


tend to measure 
demanded by all 
ourse to the re- 
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sults of the factor analyses, since those analyses were 
necessarily confined to contiguous items, In a second 
line of inquiry we have set out to learn whether or not 
the spread or variability of performance is a function of 
the individual. Here we are granting that the instrument 
is such as to permit variability of performance, and we 
are raising the question as to whether individuals are 
consistently variable or nonvariable. Later it will be 
seen that the answer to this last question will need to 
be modified in light of the answer given for the first 


question. 


Spread as a Function of the Scale 


The question regarding the spread of passing and 
failing as a function of items in the scale can be sub- 
divided into more specific questions. To what extent is 
the spread due to items which recur with different pass- 
ing standards at different ages, to items which are high- 
ly similar, such as repeating digits and other recurring 
test situations, and to items which seem to have some- 
thing in common over and above the general factor? The 
most satisfactory answer to the last two parts of this 
question could be determined by factor analyses far more 
extensive than is feasible with so few as 100 to 200 cases 
at an age level, In fact it would require several thousand 
cases at each level to compensate for the error involved 
in extreme dichotomies (for passing and failing) for those 
items four or five age levels removed from the given age. 

Our approach has been by way of an examination 
of the patterns of failing (and passing) for those individ- 
uals whose spread seems abnormal. In order to make 
such an analysis, the spread for all cases on each form 
was tabulated, separately for each age level, in such a 
manner as to yield distributions which showed the ex- 
tent of passing above, and failing below, each individual's 
mental-age level. This was not done for ages below 4 
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and above 15, since the spread downward and upward re- 
spectively for levels beyond these is automatically cur- 
tailed. On the basis of these distributions it was possi- 
ble to pick out those cases with spread of passes ex- 
tremely above their mental age level and those cases 
which showed extreme downward variability for failures 
below their М.А. level. An individual might be included 
in both extremes - the extent to which spread upward is 
correlated with spread downward will be reported later 
in this chapter. The definition of ‘extreme’ spread is 
arbitrary - in order to have sufficient numbers, about 
28 per cent of the cases have been designated as ех- 
treme. It is of course very doubtful whether so large 
a percentage can be considered as really representing 
unusual variability, 

A record was made of the particular items passed 
at all levels above an individual’s M.A. level for each of 
the 709 subjects chosen as showing extreme passing 
Spread, and a similar record was made of the failures 
below М.А. level for each of 757 extreme cases of down- 
ward spread. Then a tabulation was made of the pat- 
terns of items failed by those with downward spread of 
failures, and Separately for the patterns of successes for 
those with upward variation of passes. In determining 
the frequency of occurrence for the patterns, the data 
were grouped according to M.A, levels, since individuals 
of the same М.А. level have more trials or attempts in 
common than those of the Same С.А. In so arriving at 
S, the results for Forms L 
AS one might expect, the to- 
ch occurred was very great 
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levels, but the solution of this problem would involve 
empirical probabilities for the passing (or failing) of 
single items by an individual of a specified M.A. Even 
if this could be resolved, the chi square comparison of 
observed frequencies with expected frequencies, so de- 
duced, would be highly questionable because of the very 
small expected frequencies, 

It should be noted that the tabulations yielded the 
frequencies for the patterns of failures (or passes) as 
they occurred two, three, four, five, or as many as six 
levels below an individual’s M.A. level (above for passes). 
Thus the record of the items failed (or passed) adjacent 
to the given M.A. level could be examined in order bet- 
ter to interpret the patterns of items failed (or passed) 
at levels remote from an individual’s mental age. For 
example, 28 of the 110 individuals having mental ages 
from 11 to 12 on Form L passed test S.A.I, 3. An ex- 
amination of the other items passed at this extreme lev- 
el and the two adjacent levels, A.A. and 14, by these 
individuals does not lead to a plausible explanation for 
this spread upward of passing behavior, but when the 
items which were passed nearer the ll-year level were 
scrutinized, a definite reason was found for this extreme 
passing performance: item S.A.I, 3, ‘Minkus completion,’ 
recurs from age level 12. Any individual who happens 
to possess the specific ability called for by this test and 
who gets three sentences completed, automatically re- 
ceives credit for this test at the higher level. This ex- 
ample, which is unusual in terms of frequency, is in line 
with one of our original hypotheses; namely, that a part 
of the spread is due to the recurring tests. 

The presentation here of the tabular and numeri- 
cal data which accumulated during this quest is certainly 
not feasible, hardly necessary, and perhaps unwarranted 
when one considers the complexity of the data and their 
inappropriateness to statistical treatment. We shall, 
therefore, present such results and conclusions as seem 
to emerge rather clearly when the patterns are carefully 
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examined, It is admitted that this may not be very sat- 
isfactory to the reader, but we feel certain that the data 
do not justify more elaboration, 

Let us first consider the passes for those showing 
an extreme upward Spread from their M.A. levels. On 
Form L, in addition to the ‘Minkus completion’ test cited 
above, we find that a few patterns involve 'reconciiiation 
of opposites,' a test located at level А.А. and recurring 
at S.A. П. Aside from these two recurring tests plus 
‘abstract words,’ which has a negligible frequency, one 
must search elsewhere for an explanation for most of the 
spread of passing on Form L for mental levels four to 
average adult. The recurring ‘vocabulary’ test and ‘re- 
peating digits’ (recurring test situation) are both conspic- 
uous by absence from the patterns, But since the ‘vo- 
cabulary test’ is highly saturated with the general func- 
tion measured by the entire scale, one would not expect, 
on the basis of our hypothesis, to find that it contributed 
to variability of performance, Its specific part (factor) 
is indeed small, On Form M, four recurring tests (‘pic- 
ture vocabulary,’ ‘orientation: direction,’ ‘reconciliation of 
opposites,’ and ‘ingenuity’) would Seem to have produced 
a part of the extreme passing performance. The ‘Minkus 
completion’ test of Form M is not involved (a marked 
inconsistency with the finding for Form L), while ‘repeat- 
ing digits’ and ‘picture absurdities’ as recurring test sit- 
uations occur in the patterns with negligible frequencies. 

When we turn to failures below mental age, we first 
note that the spread downward is greater than the upward 
Spread of passing, and consequently the number of patterns 
and possible patterns is much greater for failures below 
than for passes above a given mental level, For Form L, 
the following recurring tests occurred, rather frequently in 
the patterns: ‘picture vocabulary,’ ‘three-i:ole form 
board: rotated,’ ‘pictorial identification,’ ‘picture com- 
pletion: man,’ ‘vocabulary,’ ‘identifying objects by use,’ 
‘memory for designs,’ and ‘paper cutting.’ On the other 
form four recurring tests were found to be involved: 
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‘identifying objects by use,’ ‘patience: pictures,’ ‘verbal 
absurdities,’ and ‘abstract words.’ These tests appeared 
in only a fraction of the total number of patterns, but 
those patterns containing them (except ‘vocabulary’) oc- 
curred much more frequently than other patterns; i.e. a 
rather large number of individuals was affected thereby. 

Five items which recur not as the same test but 
as a similar test situation (‘picture absurdities,’ ‘com- 
prehension,’ ‘verbal absurdities,’ ‘digits,’ and ‘memory 
for sentences’) were found to be possible contributors to 
failing spread on Form L; and on Form M the follow- 
ing test situations were inyolved: ‘verbal absurdities,’ 
‘memory for sentences,’ ‘abstract words,’ ‘memory for 
designs,’ and ‘picture absurdities.’ These recurring test 
Situations do not automatically produce a failure, as does 
a poor performance on an actual recurring test, but in 
case an individual is somewhat lacking in the specific 
ability needed and in case he meets the situation some- 
what below his general level of performance, he will be 
apt to fail and thereby add to his variability score. 

It is thus seen that a part of the spread of per- 
formance is definitely linked to recurring tests and re- 
curring test situations, but it cannot be claimed that 
this is an explanation for anything like all of the extreme 
variation. There may, however, be an indirect connec- 
tion between failing to meet the criterion for passing a 
recurring test and the failing of other items at the lower 
level to which an individual is thereupon taken back, 
Some of these easier items may not be of sufficient chal- 
lenge to him. It is interesting to note that only about 
one-half the recurring tests and recurring test situations 
which are included in Forms L and M were involved in 
the patterns which we have examined. The reason for 


this seems obscure. 
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Spread as a Function of the Individual 


That the variability of performance on examina- 
tions of the Binet type is a characteristic of the individ- 
ual has been frequently postulated, and Kuhlmann in his 
1939 Tests of Mental Development presents norms for va- 
riability scores. He does not, however, claim any sig- 
nificance for his measure of variability, but rather gives 
norms so that if, and when, psychological or clinical 
meaning can be attached to a variability score, they will 
be available. One of the first questions to be raised a- 
bout a variability score is its stability from test to test; 
le. does an individual show consistently high or low 
variability about his general level of performance? In 
answering this question, we shall accept the usual vari- 
ability score as the distance, in terms of age levels, 
from an individual’s basal mental age to the highest lev- 
el at which tests are passed, 

Aside from the consistency of variability scores, 
there is another aspect of variation which may be of 
some interest, namely the relationship of upward spread 
of passes to downward spread of failures, Let us first 
present some data on this point. The correlation (tetra- 
choric because of fewness of categories) of upward versus 
downward variation was determined for three mental-age 
groupings, the groupings being made separately for, and 
on the basis of mental ages on, Forms L and М. Al- 
though for a particular mental-age group, the cases for 
the Form L correlation will not be exclusive of those in- 
volved in the correlation for Form M, the cases will not 
be exactly the same and the N’s need not agree, These 
correlational results are given in Table 23, from which 
it can be seen that there is only a slight relationship be- 
tween upward and downward (from an individual's М.А.) 
spread of performance. Insofar as passing above reflects 
high motivation and failing below represents poor effort, 
the lack of correlation is not surprising, Variability of 
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TABLE 23 


CORRELATION BETWEEN UPWARD AND DOWNWARD 
SPREAD OF PERFORMANCE FOR CERTAIN LEVELS 


Mental age 54-59 120-131 144-155 

Form L M L M L M 
N 113 122 204 201 201 198 
Tt .20 .18 .17 .06 .27 -.03 
ст (r=0) .18 .15 411 11 11 .11 


motivational factors for an individual during the test ad- 
ministration would, however, tend to produce correlation. 
The above correlations include a small spurious or arti- 
factualelement in that both the upward and downward 
spreads are measured from the individual’s M.A. level, 
which in а sense is an average of the two variations, For 
instance, it is unlikely that a high upward spread could 
be accompanied by no failures below one’s mental-age 


level, 
In order to obtain some information on the con- 


sistency of variability scores, we have correlated the va- 
riation score as determined from Form L with that se- 
cured on Form M for six groups at different levels of 
maturity. As in the case of the correlations just report- 
ed, mental-age groups were used instead of life-age 
groups because variability of performance as measured 
is not correlated with С.А. for constant mental age 
whereas it is related to M.A. when С.А. is held con- 
stant. Mental age is in this case based upon the com- 
posite of Forms L and M. 

In reporting the resulting product-moment согге- 
lations and means and standard deviations (see Table 24), 
we have not made any correction for two factors which 
disturb these values for the three lower groups. These 
are (1) the fewness ог coarseness of the categories and 
(2) the counting, in determining the variability score, of 
spread over half-age and full-age levels as though they 
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were of the same value. Since the tetrachoric r’s, which 
will not be much affected by either of these factors, are 
in close agreement with the reported product-moment 5. 
efficients, we assume that the error introduced is is 
Such as to disturb Seriously our general conclusion to 
the effect that these data lead one to question the t 
sistency or reliability of individual variability scores. 


TABLE 24 


CORRELATION FOR VARIABILITY SCORE ON FORM L 
WITH THAT ON FORM M 


M.A. - 
group 42-47 60-65 84-95 108-119 120-131 144- 
N 95 125 187 174 204 x2 
М, 4.15 3.36 432 607 6.28 a 
My 4.48 3.96 426 6.06 6.31 6.7 
% 1.02 135 141 157 1.46 1.52 
^v 104 11 160 135 1.44 1,63 
Г -077 „304 1967 2560 .190 .059 
this time, 


it should be pointed out that insofar as spread 


of performance is a function of the nature of the scales, 


particularly the Presence of highly similar tests (such aS 
recurring test situations) Which introduce or permit 
narrow group factors, and in case the same narrow fac- 
tors are involved in both forms, we have a condition 


which would tend to Produce some correlation between 
variability on the two forms, 


that these correlatio; 
for the ‘reliability’ 
this be true, and u 
by unknowns, we are for 
nificance can be attached 
dividual variation. 
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age implies, of course, that variability is not related to 
brightness when mental maturity is kept constant, In 
order to check on this point, four different M.A. groups 
were chosen and the correlation between variability and 
LQ. was determined. This was done for Form L only. 
The results are given in Table 25, from which we infer 
that the relationship is possibly negative, but the r’s are 
so near zero that very little of the variability variance 
can be attributed to, or associated with, brightness. 


TABLE 25 


CORRELATION BETWEEN VARIABILITY AND BRIGHT- 
NESS OR LQ. FOR CONSTANT M.A. GROUPS. 


FORM L 
M.A. group 72-83 96-107 120-131 144-155 
N 263 196 204 201 
r -.097 -.016 -.097 -.098 
от (г-0) .062 .071 .070 .071 


In closing this chapter, a few limitations of the 
data analyzed herein should be mentioned. The stand- 
ardization-testing involved preliminary forms which con- 
tained more items than the final forms, a different order 
for some of the items, and in some cases rather poorly 
placed items as regards difficulty. It is not possible to 
say how much these factors would invalidate such con- 
clusions as have been drawn; or, to put it differently, we 
cannot be positively sure that the results are the same 
as would be found if similar analyses were made on new 
data based upon examinations with the final forms. Our 
conjecture is that substantially the same findings would 


emerge. 
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In constructing an age scale, the per cent passing 
an item at successive age levels takes on considerable 
importance. In the first place, such percentages supply 
Some necessary, though not sufficient, information for 
establishing the validity of an item. Іп the second place, 
they provide the necessary data for arranging items ас- 
cording to difficulty, i.e, locating the items in the scale. 
These two main points are so well known that one should 
not need to discuss them, but since some people persist 
in misunderstanding the role played by curves of per 
cent passing by age, it may not be amiss to recapitulate 
briefly the rationale underlying their use. 

To determine the validity of an intelligence-test 
item is a far greater task than to ascertain its difficulty. 
One begins either with a common-sense notion of intel- 
ligence, usually delimited as to kind, or with а high- 
powered definition couched in the jargon of contemporary 
psychology. Then one searches for items which will, 
according to the best judgment, provide behavioral situ- 
ations calling for the kind of intelligence implied in the 
definition, That a particular psychologist’s definition of 
intelligence is not the main determiner of the outcome 
is attested to by the fact that the tests constructed by 
individuals having different conceptions of intellect tend 
to be highly intercorrelated, This, of course, does not 
prove that the items Selected а priori by the several 
test-makers are valid; it merely demonstrates that there 
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Now, there would seem to be agreement that gen- 
eral intelligence is a characteristic which develops with 
age, so it is entirely logical to lay down the require- 
ment that an item cannot be regarded as valid unless it 
yields a larger per cent passing for successive age lev- 
els through childhood. It should go without saying, how- 
ever, that this requirement in and of itself does not 
guarantee validity. For instance, the ability to throw a 
baseball a distance of twenty-five feet will show an in- 
crease in success with age, but such an item would, for 
obvious reasons, never reach the tryout stage. All test- 
constructors have recognized the necessity of other cri- 


teria of validity in addition to increase in per cents pass- 
ing at successive age levels. Among the additional cri- 
ubjec- 


teria most frequently used are correlations with S 
correlations with scores 


tive ratings of intelligence, 
lations with 


earned on other intelligence scales, and corre 
total score on the battery of which the item is a part. 
The merits and limitations of these and other criteria 
have been discussed in Chapter I. As there stated, the 
elimination of test items from the trial series for the 
new Stanford-Binet revision was based in part on corre- 
lation with the composite of L and M scores, a procedure 
which insures that retained items will be saturated with 


а common factor. 
With regard to the arrangement of items accord- 


ing to difficulty, and their allocation to a given age level, 
it should first be remarked that no attempt was made to 
arrange the items within an age level in order of diffi- 
culty. This, despite the carping of a few critics, cannot 
be regarded as a serious drawback, since the differences 


in difficulty within a level are usually small. Besides, 
there are times when psychological reasons should take 
cularly 


precedence over purely statistical dictates, parti 
when the statistical differences are of negligible practical 


significance. 
Many have pointed out the difficulties in construct- 


ing an age scale, but only those who have been through 
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the mill are in a position to appreciate fully the d 
Stacles, At times, compromises must be made - Fo 
Contingencies cannot be foreseen, It happens о 
items were retained which were not entirely са 
as regards their final allocation, or passing curves. on 
instance, it will be seen from the table of BEF ЖО t 
passing that item M, III, 4 is easier than M, II-6, 3, bu 
the two could not be switched because the latter ig 
was needed at level Ш, with a higher passing standard. 
Item L, V, 3 would appear to be much easier for px 
3-1/2 and 4 than the other items at level V; and M, УП, 
2 is somewhat more difficult for age 8 and up than "i 
other items located at level VIL Items L, XIV, 2 "s 
M, XIII, 2 yield curves which tend to flatten too muc 
G.e. more than desirable) for the four upper age sane 
Test L, XI, 1 would Seem to be the worst misplaced о 
all, as it corresponds in difficulty to items located at 
level XIII, ў 
А number of clinicians have expressed the belief 
that many items are по 
difficulty, 
groups; but 
actual data r 


be based on representative samplings. Furthermore, the 


something that can be established once and for all. It 


С. Н. Growdon, "Is the Revised Stanford-Binet Scale Really 
an Age Scale?" Psychol. Bull., 1940, 37, 512. (Abstract: 
the writer heard this paper at the Pennsylvania State College 
А.Р.А. meeting.) 
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countries. It is not surprising, for example, that Burt! 
finds the order of difficulty of the New Revision items 
for London children somewhat different from that for 
the American standardization group. One is a little 
puzzled, however, as to the meaning of Burt's assertion 
that (there seems to be no fixed order at all. what is 
easier for one child may be harder for another.' 

We would also call attention to the fact that the 
curves of per cents passing by age can affect the vari- 
ability of LQ.'s. The reader may recall the discussion 
in Measuring Intelligence (page 40) concerning the rather 
marked fluctuation in standard deviations of LQ.'s for 
various age groups. In particular it was noted that the 
standard deviations were decidedly too low for age 6, and 
somewhat high for ages 2-1/2 and 12. At the 1937 
writing no explanation had been found for these apparent 
facts, but it was pointed out that there was nothing in 
the samplings to suggest that the atypical standard devia- 
tions might be due to selective factors. Since then, а 
closer scrutiny of the curves for per cents passing has 
convinced us that the difference in variability is an arti- 
fact of the scale. It results from strange, and undetected, 
accidents, It is well known that the extent of variability 
is partly а function of item difficulty. It so happens that 
no items yield curves for per cents passing which cross 
the ordinate for age 6 between the 35 and 65 per cent 
levels of difficulty. This fact will definitely result ша 
narrower spread of M.A.'s and LQ.'s for 6 - year - olds 
than would have resulted had this imperfection been ab- 
sent. The same situation as regards lack of items of 
medium difficulty exists for ages 5 and 5-1/2, though not 
so markedly as for age 6. The greater variability at 
ages 11 and 12 may be due to a concentration of items 
of medium difficulty for these ages, a concentration 
which is greater than that at other ages, except at 
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2-1/2, 3, and 3-1/2. There is, however, nothing about 
the item difficulties for age 15 which would enable one to 
predict the rather large 5.75 for LQ.'s at that age. Al- 
though it is unfortunate that these differences in variabil- 
ity should exist as an artifact of scale construction, it is 
somewhat Satisfying to know that one need not entertain 
the psychoanalytic explanation offered by Bellak.! 

Anyone who examines the data of Table 26 for per 
cents passing will note that the plan of the 1916 revision 
has again been used in allocating items to the several 
age levels, This Procedure involves a shifting standard 
as to difficulty as one Proceeds from the lower to higher 
age levels, For instance, the items located at level II 
are of such difficulty that they are passed by about 77 
per cent of 2-year-olds; the level V items are passed by 
about 70 per cent of 5-year-olds; those at VIII by about 
63 per cent of 8-year-olds, etc. The large jump in dif- 
ficulties at the adult levels is, of course, necessary in 
order to provide ‘top,’ but our chief interest just now 
concerns the relationship between difficulty and item 
placement, 

It is quite evident that there is considerable mis- 
understanding of the issues involved here. A very un- 
usual misconception Comes from the pen of M.W. Rich- 
ardson? It might have been expected that a critic of 
ave better informed himself about 
age scales before making the statement that ‘the age at 
еп pass the test is taken as the 
Some five pages later Rich- 
believing that subtests are 
assigned to the yearlevel 

Guilford? evidently holds 
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to the same view. He states that ‘a proportion of 50 per 
cent would have been a better criterion of age level. A 
knowledge of psychophysical procedure, as in the constant 
methods, would have suggested this criterion.’ 

It seems to us that the misunderstanding about the 
Binet method of a sliding scale of difficulty for allocating 
items to age levels may be due to failure to appreciate 
any one or more of the following considerations. (1) The 
fact that it is simply impossible otherwise to construct 
an age scale of the Binet type that will yield mean mental 
ages equal to mean chronological ages. (2) The fact that 
the location and grouping of items at a given level is 
mainly one of convenience which facilitates testing and 
scoring. This convenient way of arranging tests is of 
course closely related to the first consideration, (3) The 
fact that the individuals of any age group encounter items 
which are actually of 50 per cent difficulty for their age 
group even though the items placed at their own age lev- 
el may be less difficult. The criticism for not ‘proper- 
ly’ locating items is actually invalidated by this third 
point, 

The presentation of Table 26 for per cents pass- 
ing each item by age will no doubt become an invitation 
to some to use these data for establishing a growth 
curve. Perhaps a new ‘absolute’ zero point for intelli- 
gence may be found, and no doubt such a curve (or 
curves if the items for Forms L and M are treated sep- 
arately) will shed apparent light on the question as to 
when mental maturity is reached. These data, however, 
may not be entirely satisfactory for studying mental 
growth. As in all cross-sectional studies, one must be 
reasonably sure that the several age samples are strict- 
ly comparable as regards their representativeness. Any 
selective factors present will tend to distort the derived 
growth curve, and such distortion can become a serious 
restriction to extrapolation. It happens that our рге- 
school samples and those for the top two or three ages 
are not representative. This fact led to an intentional 
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, 
adjustment in the direction of permitting mean 1.0. 5 
above 100 for children of our sample at these levels. 
Whether or not the allowance has been adequate will be- 
come more or less evident with use. However, the 
pertinent point to note here is that the per cents for 
passing have not been adjusted for known inadequacies in 
the sampling procedure. This fact is, we believe, suffi- 
cient to render highly questionable the meaning of growth 
curves which might be derived from these data by any 
of the so-called Scaling methods, 
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(Names of tests may be found in Appendix C) 


Age 
13 2 2} 3 33 4 4i 5 
Item 
L,11,1* 18 19 97 99 98 98 100 
2 37 86 95 97 99 95 99 
3 11 74 92 92 99 94 99 
4 23 67 85 92 97 99 100 
5 13 79 92 94 99 98 100 
6* 28 T 98 96 100 98 99 
a 28 64 84 93 94 97 98 
M,II,1 50 80 83 90 96 99 100 
2 39 81 98 98 99 98 100 
3 10 68 97 93 99 98 99 
4* 18 19 97 99 98 98 100 
5 14 12 92 93 99 98 100 
6* 28 17 98 96 100 98 99 
а 0 61 88 90 99 99 99 
1,1-6,1 10 52 77 89 95 97 98 100 
2 5 58 82 88 97 97 99 100 
3 0 24 76 84 99 98 99 99 
4 0 23 77 88 эт 98 98 100 
5 0 38 74 86 92 98 97 99 
6 5 45 72 79 92 95 97 99 
а 18 65 87 91 98 98 99 100 
M,II-6,1 2 26 76 86 96 98 98 100 
2 10 45 80 92 97 99 100 100 
3 0 26 64 81 96 97 98 99 
4 0 32 76 59 99 98 99 100 
5 2 38 72 83 97 98 96 100 
6 m 45 68 84 94 97 98 100 
a 2 17 т 92 98 99 100 100 


“Item duplicated on other form. 
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№ 3 та 
Item 
БШ 0 9 50 73 
2 оп 51 73 
3* 0 5 36 73 
4 6 27 52 65 
5 0 12 30 62 
6 0 15 55 16 
a* 2 23 49 65 
М,Ш,1% 0 5 36 73 
2 0 т 44 69 
3 0 13 51 67 
4 6 27 63 87 
5 0 6 42 69 
6 0 8 38 64 
a* 2 23 49 65 
L,III-6,1 10 48 62 
2 1 13 32 
3 1 14 41 
4 3 27 52 
5 11 42 62 
6 6 22 38 
* 0 14 а 
М,Ш-6,1 1 19 58 
2 2 9 38 
3 0 8 29 
4 2 17 39 
5 0 22 54 
6 5 34 49 
* 1 9 29 


*Item duplicated on other form, 
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Age 
4 


61 
79 
74 
77 
64 
62 
73 


82 
77 
62 
72 
72 
81 
70 


55 
47 
53 
38 
53 
48 
43 


56 
67 
42 
58 
59 
59 
60 


91 


95 


94 
100 
88 
91 
97 
97 
88 


79 
77 
82 
77 
75 
78 
77 


83 
87 
78 
81 
85 
86 
94 


100 


100 
95 


100 


96 
96 
97 


100 


100 


100 


100 
100 

99 

99 
100 
100 
100 


100 
100 

99 
100 
100 
100 

99 


TABLE 26 (Cont.) 
PER CENTS PASSING ITEMS BY AGE 
Age 


2% з 32 4 4 5 52 6 7 8 9 


Item 
L,V,1 3 6 19 27 50 66 75 86 97 100 
2 1 2 19 38 65 82 85 92 97 100 
3 11 18 49 65 66 86 94 94 96 98 
4 0 0 4 16 50 67 82 95 99 100 
5 2 2 24 27 52 64 16 83 90 94 
6 2 3 22 30 50 68 81 91 99 99 
a* 0 з 6 20 44 69 74 92 98 100 
M,V,1 6 15 33 46 66 78 92 93 98 100 
2 3 10 24 39 50 79 87 96 99 100 
3 0 0 13 25 51 70 72 84 98 100 
4 1 4 16 33 51 63 63 77 92 98 
5 0 2 18 22 43 65 16 90 96 99 
6 0 0 7 18 41 69 76 85 89 97 
a* 0 3 6 20 44 69 74 92 98 100 
БУ 0 3 15 36 50 бт 89 97 99 
2 0 11 29 44 55 70 86 95 99 
3 5 11 26 46 53 69 86 96 98 
4 1 3 1 43 48 т 94 96 99 
5 7 16 29 47 51 13 94 95 100 
6 20 26 44 52 61 81 91 93 99 
78261 0 316 53 60 75 96 99 100 
2 0 5 25 42 64 81 91 97 99 
3 O 8 19 40 46 68 90 93 98 
4 0 18 40 62 74 83 96 98 100 
5 2 5 20 39 54 73 95 100 100 
6 10 18 


29 50 50 15 85 96 98 


*Item duplicated on other form. 
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Age 


93 


10 


12 


13 14 


100 


100 
100 


100 


99 100 
95 94 
96 97 
96 97 
95 97 
95 93 


98 98 
98 98 
98 98 
100 100 
98 99 
97 98 


15 
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98 
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526 


оон нн о 
Aochne ne 


оюк M н Ow 
олт юоо љ 


eoonno 


m 
әоое-оо 


Асе 


94 


12 


100 


16 


17 


95 
99 
94 
99 
99 
99 


99 
100 
98 
94 
93 
98 


100 
88 
80 
99 
98 
91 


91 
98 
94 
99 
90 
96 


18 


98 
84 
76 
94 
89 
93 


96 
94 
92 
97 
98 
92 
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TABLE 26 (Cont.) 
PER CENTS PASSING ITEMS BY AGE 


Item 
L,XL1 6 9 21 30 
2 7 16 34 51 
æ 0 2 M 28 
4* 4 14 30 40 
5 10 22 40 52 
6 а 17 34 48 
M,XL1 5 18 29 44 
2 7 24 36 51 
3 2 1 33 48 
45. 0 а М 28 
5* 4 17 34 48 
6* 4 14 30 40 
LXIL1 1s 10 а 
2 ай 24 47 
3. 2 10 26 43 
4 4 9 20 39 
5 2 2 10 27 
6 3 11 26 40 
MXui 1 т 28 37 
2° 2 10 26 43 
з 4 6 20 35 
4 1 4 15 33 
5 6 11 33 37 
в 313 33 47 


*Item duplicated on other form. 


Age 
11 12 
43 57 
65 75 
58 11 
50 61 
65 74 
59 12 
54 59 
64 69 
68 60 
58 "1 
59 72 
50 61 
46 64 
56 62 
56 69 
48 63 
52 68 
52 61 
60 65 
56 69 
46 61 
54 68 
53 62 
51 59 
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Item 

L,XIII,1 13 21 29 39 
2 4 8 22 38 
3 0 5 20 28 
4* 1 6 25 40 
5* 0 2 10 20 
6 5 10 24 35 

M,XIII,1 13 22 27 38 
2 1 5 13 29 
3* 0 2 10 20 
4 0 0 2 10 
5* 1 6 25 40 
6 1512 31 

L,XIV,1 0 4 8 
2 0 5 8 
3* 3 15 27 
4 2 5 19 
5 4 20 31 
6 3 4 11 

M,XIV,1 5 12 20 
2* 3 15 27 
3 т 19 27 
4 оз 6 
5 зти 
6 1 6 14 


Ag 


11 


41 
51 
41 
52 
44 
52 


44 
45 
44 
34 
52 
41 


32 
22 
43 
21 
43 
31 


26 
43 
37 
28 
25 
32 


*Item duplicated on other form, 
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12 


53 
58 
4T 
63 
49 
59 


47 
49 
49 
49 
63 
51 


48 
33 
48 
39 
49 
53 


37 
48 
48 
43 
39 
40 


13 


60 
69 
60 
69 
66 
64 


54 
59 
66 
68 
69 
67 


64 
46 
62 
54 
58 
64 


48 
62 
59 
57 
48 
53 


14 


65 
69 
62 
74 
69 
10 


66 
53 
69 
70 
74 
69 


70 
53 
63 
59 
65 
70 


51 
63 
61 
64 
62 
56 


15 


16 


11 


74 
82 
81 
82 
84 
91 


74 
54 
84 
94 
82 
81 


95 
61 
81 
79 
81 
90 


66 
81 
77 
94 
81 
71 


18 


75 
89 
19 
74 
90 
83 


73 
56 
90 
88 
14 
84 


91 
68 
82 
90 
84 
89 


77 
82 
80 
88 
90 
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9 10 1 
Item 
L,AA,1 0 2 7 
2 2 5 16 
3 0 1 8 
4 2 6 13 
5 0 1 3 
6 2 4 9 
1 a 8 13 
8 4 6 14 
M,AA,1 оо 6 
2 2 6 9 
3 3 6 14 
4 2 T ш 
5 0 0 5 
6 9 12 22 
7 0 0 3 
8 3 6 13 
L,SAI,1 1 2 
2 9 20 
3 4 12 
4* 4 9 
5 оо 
6* 0 2 
M,SAI,1 6 1 
2 1 5 
3* 0 2 
4* 4 9 
5 os 11 
6 3 8 
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Age 


97 


13 


14 


15 


50 
40 
54 
55 
44 
34 
42 
50 


53 
47 
43 
52 
45 
53 
42 
38 


29 
40 
38 
40 
26 
25 


35 
26 
25 
40 
34 
30 


16 


17 


67 
59 
12 
64 
59 
54 
53 
59 


64 
54 
46 
56 
60 
64 
60 
51 


45 
41 
40 
44 
38 
41 


58 
46 
41 
44 
39 
42 


18 


68 
54 
73 
61 
60 
68 
57 


56 


70 
77 
48 
63 
72 
68 
60 
54 


46 
48 
41 
50 
36 
48 


55 
43 
48 
50 
44 
35 


11 
Item 
1Һ5АП,1 1 
2 2 
3 3 
4 0 
5 3 
6% 7 
M,SAIL1 0 
2 2 
3 0 
4 4 
5 2 
6» Т 
L,SANI,1 
2 
3 
4 
5 
6 
M,SAIII,1 
2 
3 
4 
5 
6 


*Item duplicated on other form, 
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Chapter IX 
FACTOR ANALYSES 


It is to be regretted that facilities were not avail- 
able for the use, necessarily extensive, of the factorial 
methods as supplementary criteria for the rejection or 
retention of test items, Without committing ourselves to 
any particular theory as to the organization of intellec- 
tual abilities, we are inclined to the position that a use- 
ful measuring scale should be highly saturated with one 
common factor to the exclusion of all conspicuous group 
factors. These conditions are necessary if the scores 
of individuals are to be comparable - the presence of a 
large group factor, or factors, permits two equal scores 
to be qualitatively different and two different scores to 
be quantitatively (with respect to the central function be- 
ing measured) the same. The realization of this aim to 
construct a scale which measures one central function, 
we believe, can be attained most exactly by the factorial 
methods, but such methods, because of the labor involved 
are not feasible when one is to choose, Say, 30 items 


from a total of 100. 
It will be recalled that several criteria, separate- 
ly and in combination, have been employed for the pur- 
pose of rejecting or retaining items for the present ге- 
vision. Certain of these criteria tend to select items 
which will intercorrelate, thus assuring the presence of 
a common factor, although not assuring the absence of 
group factors. In the first place, the a priori selection 
of items for the original tryout, the subsequent checking 
of each item against the 1916 Stanford Revision, and the 
further requirement that the item show a satisfactory per 
cent passing progression with age, should all operate 
o lead to the retention of items having something 
Then the additional criterion that the item 
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ith the 
must show a fair degree of positive сви tende to 
composite point score based on all the ated: with 
the inclusion of items which are definitely наг that to а 
а common factor. In fact, it has been shown an i 
close degree of approximation the correlation о nee 
with the total score corresponds to its first (or g 
factor loading. " = 
The purpose of the factor analyses herein ЖЫ 
ed was twofold: to supply an objective answer deg 
question as to whether the items retained (by othe elus 
teria) are saturated with а common factor to the seio 
Sion of group factors; and possibly to supply infor E 
Which might suggest whether the common factor а! ther 
абе level corresponds to the common factor at ano et 
age level. Or stated differently, the Thurstone centr a 
method of factorization has been used in order to d 
in full or in part the question: Do the items of a give 


i is 
level measure a common central factor, and, if so, 
the central factor 


s 
at one level of maturity the same а 
at other levels? 


In view of the criter 
rejection of items, one nee 


found that a general factor 
given level, If 


ia used for the retention or 
d not be surprised if it is 


intelligence, и 
that group fact 


mon factor. The reader, Should understand 
that our purpose here is ysis of a given set of 
п be generalized from 


Spects of the organiza- 


the ana]; 
; and that little са: 
our limited setup to the broader а 


tion of abilities. 


15ее м. W. Richardson, "Notes 


on the 
alysis," Psychometrika, 1936 


Rationale of Item Ап- 
>» 1, 69-76. 
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It should be mentioned at the outset that, in gen- 
eral, one cannot expect to find large factor loadings or 
large common factor variances for individual items be- 
cause of their relatively low reliabilities. In other words, 
each variable or item will yield a rather large error 
variance which coupled with possible specific variance 
will tend definitely to limit its communality. It has not 
been possible to determine the reliability of each of the 
many items, but two lines of approach may suggest the 
possible magnitude of the reliability coefficients. Thus 
for some 34 items for which alternate ‘forms’ (one or 
more) are available, we have found for the several age 
or experimental groups a total of 112 ‘reliability coef- 
ficients’ (tetrachoric r’s). These coefficients have @ 
median value of .65, and 80 per cent of them are be- 
tween .45 and .85. We have no way of knowing whether 
these 34 items are representative so far as reliability 
is concerned. 

The second indication of an item’s reliability is 
the inference that can be made from its correlation with 
other items. If one fallible item correlates .70 with an- 
other fallible item, it is safe to assume that it has a 
reliability in the vicinity of .70 or higher. Thus the max- 
imum correlation which an item shows with another item 
might be taken as a rough approximation to its reliabil- 
ity. A tabulation from the several tables of intercorrela- 
tions of these maximum values shows a median value of 
.66 for the possible 368 maximum r’s with 90 per cent 
falling between .45 and .85. The reasons for the total 
368, exceeding the number of tests, 258, in the 
two forms is that a large number of items appear twice, 
і.е. in two different tables of intercorrelations. This 
point will become clear when the general plan for the 


number, 


factor analyses is described. 
From the foregoing it would seem safe to say that 


in general the item reliability is near .65, with consider- 
able variation above and below this value. Insofar as we 
are concerned with the average common factor variance 
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for a given set of items, it can be said that it is re- 
stricted by the presence of error variance which amounts, 
on the average, to about 35 per cent of the total item va- 
riance, and accordingly the communalities are limited, on 
the average, to .65. 

In planning the setup for the factor analyses, it 
was our aim to have each item of the scale included in 
at least one analysis, and to have a series of overlaps. 
For example, the analysis based on the 2-year-old 
sample, or experimental age 2, includes the items from 
both forms which are located at year levels II and II-6, 
while the analysis based on experimental age 2-1/2 in- 
cludes the items at levels П-6 and Ш. Thus the items 
at level II-6 overlap or are common to these two analy- 
Ses. Similarly, the items at year Ш are common to the 
analyses based on 2-1/2-year-olds and on 3-year-olds, 
and 50 on up the age scale to the analysis at experimen- 
tal age 9, where the setup differs from those at the low- 
er age groups in that it includes items at adjacent lev- 
els VIII and Х. The items at level VIII are therefore 
common to the analyses at ages 7 and 9, while the items 
at level X Overlap with the analyses at ages 9 and 11. 
This scheme of including items at both levels adjacent 
to the experimental age was followed for the four sepa- 
rate analyses based upon ages 9, 11, 13, and 15, while 
experimental age 18 includes the items of the three su- 


perior adult levels. The particular arrangements just 


tween analyses, (2) to 
alyses necessary, 
in per cent passin 


fourteen separate factor analyses, 
umn shows the location of the items 
cular analysis. Also given in this 
number of items used in each analy 


102 


The right-hand col- 
involved in a parti- 
table are the actual 
Sis, and the number 


FACTOR ANALYSES 


TABLE 27 


SUMMARY OF PLAN 
FOR THE SEVERAL FACTOR ANALYSES 


No. of 
No. of over- Location 
tests lapping of 
N included tests Items 
100 19 II, II-6 
7 
100 19 II-6, III 
11 
99 25 III, Ш-6 
* 13 
100 26 Ш-6, IV 
11 
100 26 IV, IV-6 
13 
100 26 IV-6, У 
13 
100 24 i V, VI 
1 
200 24 " VI, VII 
2 
200 24 VII, VIII 
11 
200 35 ið VII, IX, X 
200 30 А X, XI, XII 
200 30 a XII, XIII, XIV 
100 30 8 XIV, АА., 5.А.1 
100 30 S.A.I, S.A.II, 5.А.Ш 
Е 
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of items common to two analyses. It will be noted that 
the number of items for, say, the 2-year-group is only 
19, whereas there are 28 items, including the alternates, 
in both forms at levels II and II-6. This and other ap- 
parent discrepancies in the table between the number of 
tests available and the number actually used need ex- 
planation. For this purpose it should be noted that al- 
though there are a total of 258 tests in the two forms 
combined, there are in fact only 189 test situations, some 
of which are duplicated from form to form and some of 
which recur with different passing standards within the 
same form. It should also be remarked that any two 
tests which bear the same name are not necessarily du- 
plicate or recurring, since some test Situations may oc- 
cur twice, not exactly as the Same situation but as an 
alternate form, Now, as regards the factor analyses, it 
is obvious that a test which is duplicated can be used 
only once for a particular analysis, Thus tests М, ХШ, 5 
and L, ХШ, 4 are identical with identical scoring, and 


ituation can be used as just one va- 
riable, 


In case a test at level II recurs at level II-6 with 
a different passing Standard, it will appear only once in 
the analysis on experimental age 2. The reason for 
this should likewise be obvious — the performance on 
only one test situation though scored differently cannot 
be regarded as yielding two experimentally independent 


variables, and of course the correlation between the two 
will be Spuriously high}, 


е x The choice as to which of two 
such ‘tests’ should be included 38 representing the ‘test 
situation’ was made on th 


€ basis of the о i the 
better dichotomy, i.e. nearest 50-50 for b Қаға, ун 
or failing. It should, Perhaps be pointed out that the ex- 
clusion of, for example, test Thy Ш-6, 2 from the analysis 
on age group 3 because it 


recurs from level Ш does not 
1тһіѕ important point was not со; 


Е. Wright, "A Factor Analysis of the paper by R. 


5 Original s = 
Scale," Psychometrika, 1939, 4, 209-200" ыы 
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affect its inclusion as L, Ш-6, 2 in the analysis based 
on age 3-1/2, 

All the discrepancies in Table 27 between the 
number of used and available tests are explicable on the 
basis of either duplication or recurrence with five ex- 
ceptions: tests L, П-6, 3; M, П, a; M, П-6, 3 for the 
analysis on the 2-year-old group, and test L, VI, 1 for 
the analysis on age group 5. These items were not in- 
cluded because, due to their faulty placement in the pro- 
visional forms of the scale, they were too often omitted 
during the standardization-testing. Test L, II-6,a was 
excluded from the analysis at age 2-1/2 because of a 
rather extreme dichotomy which was accompanied by four- 
fold tables with a frequency of zero in one cell, Such 
tables lead to tetrachoric r’s which are greater than unity 
and therefore of doubtful meaning. 

It will also be seen from Table 27 that all N’s 
have been reduced to 100 or 200 (for age 3 only 99 cases 
were available), This was done in order to facilitate 
the determination of the tetrachoric coefficients. The 
dropping of a few cases to give even N’s was not exactly 
random in that cases with more incomplete records were 
dropped first. Some exceptions to these N’s should be 
noted. At experimental age 6, tests L, VI, 1 and М, 
УП, 2 were included with N’s of 183 and 167 respective- 
ly, and for age group 7, tests L, УП, 3; M, УП, 1; and 
M, VII, 2 were included with reduced N’s of 190, 184, 
and 182. These smaller N’s resulted from the fact that 
these items were not administered to all the subjects at 
these ages because of faulty placement in the provisional 
forms. It seemed reasonable to include these items since 
the N’s were still fairly large, whereas for the items 
mentioned in the previous paragraph the reduced N’s were 
too small to justify their inclusion. 

Although certain tests were not included in certain 
analyses, we have succeeded in having each test situation 
ireluded in at least one of the fourteen analyses, and at 
the same time have avoided extreme dichotomies. Аза 
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matter of fact, 93 per cent of the dichotomies are be- 
tween 20 and 80 per cent for passing (or failing), one- 
half are between 35 and 65, and only four are more ех- 
treme than 10-90. Two of these four are 9-91, the 
other two, 8-92, 

Perhaps a word should be said concerning the 
precautions taken to guarantee the accuracy of the compu- 
tation involved in determining 4780 tetrachoric coefficients 
and in extracting three centroid factor loadings for four- 
teen tables of intercorrelations based on from 19 to 35 
Variables, Hollerith cards, 1900 in number, were punched 
by a trained operator and at a later time a duplicate 
set was punched by the same operator; then the two 
cards for an individual were checked one against the 
other. The agreement of the several marginal totals 
was an absolute check on the fourfold table frequencies 
as copied from the Hollerith sorter, and the internal 
checks provided by the centroid method are sufficient to 
insure computational accuracy for the factor loadings. 

According to the criterion first proposed by Thur- 
stone, one Should continue to extract factors until the 


ntercorrelations, This criterion 

time (1937) these analyses were 
carried out, with the result that one factor seemed suf- 
ficient for each of the 14 separate analyses. Since the 
adequacy of the residual criterion had been questioned, 
two more factors, It was thought 


xtraction of second load- 
ings might be Superfluous, agb tes 


р а шеге dallying with chance, 
but it was done lest а more exact Criterion be devised 
which would call for at least three factors 
has since found a more г > 


them more analogous to р 
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terion required that the ms. This cri 


variance of the residuals, as 
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partials, shall approach the sampling variance of a zero 
correlation based on the given N. 

The application of either the ordinary, or the par- 
tial, residual criterion to data based upon tetrachoric 
correlations is greatly complicated by the fact that the 
sampling variance of tetrachorics is partly a function of 
the dichotomies. Obviously there will be no single crit- 
ical value to be approached by the variance of the resid- 
uals. We have therefore considered two values as rep- 
resenting possible magnitudes for the sampling variance, 
one based upon typical cuts of 35-65, and one based upon 
the more extreme 15-85 cuts, Now, some few r’s will 
have errors somewhat smaller (as much as .009) than 
obtain for 35-65 cuts, while other r’s will have errors 


as much as .05 larger than hold for 15-85 cuts. 

The complete data concerning the standard errors 
of zero tetrachorics, the standard deviations of the ordi- 
nary residuals and of the residuals as partials are pre- 
sented in Table 28, An examination of this table certain- 
ly emphasizes the need for a valid criterion for the num- 
ber of factors, especially when matrices of tetrachoric 
correlations with varying dichotomies are being analyzed. 
Consideration of either the ordinary or the partial resid- 
ual S,D's makes one rather dubious as to whether the 
extraction of more than one factor can be justified. Per- 
haps a second factor is needed for the analyses at ages 
2, 2-1/2, 6, 7, 11, and 18; but it seems unlikely that a 
third factor would contain anything more than chance. We 
are, nevertheless, presenting loadings for three factors 
for each analysis. 

There is a point of considerable interest which 
results from a perusal of Table 28. It will be noted that 
the first, second, and third factor residual distributions 
consistently show less variation for those age groups 
where М equals 200 than where М equals 100. This might 
have been surmised - the reduction of sampling errors 
has definitely reduced the residuals and therefore the 
magnitudes of successive subsequent factor loadings. A 
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TABLE 28 
STANDARD DEVIATIONS OF ORDINARY AND PARTIAL 


DUE TO SAMPLING 


Standard error 
for zero order 


tetrachorics 
35-65 15-85 
.166 .235 
" " 
" " 
" " 
" " 
" " 

" и 
.117 .166 
" " 

" " 

" " 
.166 .235 


Standard deviations of 


residual distributions 


Ordinary 


1st 


2nd 
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second factor loading, for example, of .40 is not as real 
at age 5 (N, 100) as at age 6 (N, 200). As will be seen 
later, the per cent contribution to test variance for the 
second and third factors is not as great when N equals 
200 as when N equals 100, whereas the variance contri- 
bution of the first factor seems to be independent of N. 
On a priori grounds it does not seem reasonable to be- 
lieve that the items involved in the analyses at ages 6, 
7, 9, 11, and 13 (N’s of 200) are such as to show a 
factorial structure consistently different from that for 
the other analyses. Certainly the items in the analysis 
on age group 5 are markedly similar to those in the 
analysis at age 6 (11 items are in common, and 6 high- 
ly similar item situations are also in common), and the 
analyses at ages 13 and 15 are likewise based upon items 
which are much alike; yet the residual variations for 
ages 5 and 6 and for ages 13 and 15 differ in the direc- 
tion of greater variation for the smaller samples, The 
percentage contribution of the second factor to test уа- 
riance also shifts as we pass from age group 5 to 6; 
when N is smaller, the second and third factors seem to 
contribute more to test variance, All these facts tend 
to suggest that the second and third factor loadings are 
not as stable from the sampling standpoint as the first 
factor loadings, and that the general magnitude of resid- 
uals and subsequent loadings is a function of the size of 
the sample. 

It must be suspected that attempts to rationalize 
the meaning of the third factor loadings, and to a large 
extent the second factor loadings, have in general led to 
grief, and consequently to the conclusion that the criteria 
used are not too crude. This general problem regarding 
the influence of sampling errors will again confront us 
as we proceed with an exposition of the further findings 
of the several analyses. 

The results, in terms of centroid factor loadings, 
for the several analyses, are presented in Tables 29-42. 
The name of the test or item is preceded by its location 
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as to form and age level and followed by its three factor 
loadings. The right-hand, apparently incomplete, column 
gives the first factor loadings for the overlapping items 
aS computed on the next higher experimental age group. 
Thus the column in Table 29 headed as К, (2-1/2) gives 
the k , values found in the analysis at age 2-1/2 for those 
items which are common in the 2- and 2-1/2-year anal- 
YSes. At the bottom of each table will be found К? /n, 


example, item M, Iv, 
loading of only .241 for the 
not be considered a Very sati 
measurement of a unitary 5 
can ‘block counting,’ M, ЭС. Pra 
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there is considerable variatii Even analysis 
first factor loadings, How m 
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writer, however, has published’ empirical data on the 
sampling errors of centroid factor loadings which indi- 
cate that a first factor weight is subject to sampling 
fluctuations of about the same order of magnitude as a 
correlation coefficient of the same size, and of the same 
type, product moment or tetrachoric, as in the starting 
matrix, 

It may be of some interest to list the items or 
item situations which tend to yield high first factor load- 
ings and those which yield low ones. In the case of 
items or item situations for which two or more loadings 
are available (і.е. overlapping tests, recurring tests, and 
recurring test situations), it was required that the load- 
ings be consistently high, ог low, in order to be included 
in the listings. ‘High’ and ‘low’ are to be taken in a 
relative sense. Rather than throw items from all levels 
together, the listings are made separately for II to IV-6, 
V to XI, and XII to S.A.IIL Each of these trichotomies 
includes about one third of the items in the scales. 

At levels II to IV-6, high first factor loadings 
were yielded by these items: 

Picture vocabulary 
Identifying objects by name 
Response to pictures 
Comparison: balls; sticks 
Comprehension 

Opposite analogies 
Pictorial identification 
Materials 

Low loadings occurred for the following items 

of levels II to IV-6: 
Biock building: tower 
Block building: bridge 
Three-hole form board: rotated 
Motor coordination 
Copying a circle 


141-152. 
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Drawing a cross 
Three commissions 
Stringing beads 
The i item situations at levels V a 
which are highly saturated with the general factor are: 
Pictorial likenesses and differences 
Similarities: two things 
Vocabulary 
Verbal absurdities 
Similarities and differences 
Naming the days of the week 
Dissected sentences 
Abstract words | a 
Those at levels V to XI with low loadings are 
follows: 
Paper folding: triangle 
Patience: rectangles 
Copying a bead Chain 
? Copying a bead chain from memory 
Picture absurdities 
Word naming 
Word naming: animals 
Block counting 


Among the items at the upper levels, XII to S.A. 
ПІ, the following tend to 


have the highest first factor 
loadings: 
Vocabulary 
Verbal absurdities 
Abstract words 


Differences between abstract words 
Arithmetical reaso 


ning 
Proverbs 
Essential differences 
Sentence building 
The items at the u 
saturated with the central 


‚ Problems of fact 
Copying a bead chain from memory 
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Memory for stories 
Enclosed box problem 
Papercutting 

Plan of search 

Repeating digits 
Repeating digits: reversed 

When we turn to the question of the presence of 
Group factors we are again baffled by ignorance concern- 
ing the possible sampling variation in factor loadings. 
The empirical study referred to above shows that second 
and third centroid factor loadings have standard errors 
Which are much greater than those of correlations of the 
Same magnitude. Two samples, each of size 100, may 
lead to second factor loadings for a given variable which 
differ by as much as .50 or .60. These considerations, 
Coupled with the possible insignificance of the second and 
third factors as discussed in relation to the criteria for 
determining the number of factors to be extracted, force 
one to be skeptical as to what can be said concerning 
8Toup factors in this study. Nevertheless, whenever the 
descriptions of items in terms of their second and third 
factor loadings show logical consistencies, the fact can- 
not be ignored, but of course it must be remembered that 
chance can so operate as to lead to apparent consistencies. 
Before discussing the possible meaning of the second 
and third factors for the separate analyses, it should be 
noted that on the average the second factor accounts for 
from 5 to 11 per cent of the test variance and the third 
Contributes from 4 to 7 per cent. 

When the items in the analysis at age 2 are plotted 
with reference to the second and third centroid axes, one 
gets a vague impression that the Second factor differenti- 
ates slightly between items Which involve some kind of 
‘identifying’ or ‘knowing’ of objects and items of а ‘motor’ 
or ‘memory’ nature, while these latter types are roughly 
Separated along the third axis. If we let the second axis 
be the abscissa and the third the ordinate (a scheme fol- 
lowed in subsequent discussion), the tests falling in the 
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two right-hand quadrants involve ‘identifying,’ while the 
upper left quadrant contains ‘motor’ tests and the lower 
left contains the two digit tests, A plot of the tests in- 
cluded in the analysis at age 2-1/2 shows this same gen- 
eral feature provided the arbitrary centroid axes are го- 
tated counter-clockwise through about 90 degrees. Here 
again, however, the ‘motor’ tests seem to merge with the 
‘identifying’ tests. That these results may not be en- 
tirely due to chance or sampling is supported by the fact 
that the variance of the first factor residuals (see Table 
28) indicates that one factor may not be sufficient to ех- 
plain the intercorrelations on age groups 2 and 2-1/2. 
It should also be noted that the second factor at age 2 
accounts for an average of 9.5 per cent of the test va- 
riance and that 10,9 per cent at age 2-1/2 is accounted 
for by the second factor, (It is not assumed or implied 
that the second factor as found on one age group is the 
Same as that found at another age.) For no other anal- 
ysis except at age 18 does the second factor contribute 
50 much to the test variance, 

The tests in the analyses at ages 3, 3-1/2, 4, and 
4-1/2 when plotted with reference to the respective вес- 
ond and third centroid Coordinates fail to fall in any 
logical groups or form clusters or show simple structure, 


and therefore no meaning can be attached to the second 
and third centroid 


"ue axes or rotation thereof, If group 
actors exist among the items at these levels of maturity, 
our samples are to 


ysis at age 5 reveals : 
some order: the 
upper right quadrant contains items Which are, relative 
to the other items, more ‘verbal’ in nature (two pictorial 
tests in this same P 


eft quadrant contains 
four ‘numerical,’ number concept, tend pee im- 
Se Out of the arrange- 
rants. This differenti- 


and the more ‘verbal’ tests 
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stands out rather clearly in the analysis based on the 6- 
year-old group, where the separation is again along the 
Second (arbitrary) axis. The third axis, as at age 5, 
Seems void of meaning. It will be recalled that the spread 
of the first factor residuals (see Table 28) for the anal- 
ysis on age group 6 was such as to suggest the extrac- 
tion of a second factor. 

There is a lack of consistent groupings for the 
tests in the analysis on age 7, while for age 9 the ‘verbal’ 
type tests seem opposed, along the second axis, to the 
‘memory’ type tests. This same separation occurs for 
the 11-year-group, and therefore this differentiation of 
the items located at age levels 8, 9, 10, 11, and 12 is 
not likely to be the result of chance. For both analyses 
the ‘repeating of digits’ tests are roughly opposed along 
the third axis to tests like ‘memory for designs,’ but in 
Seneral it is difficult to assign any meaning to the third 
centroid factor, 

When we turn to the results for age 13, we find 
ап inconspicuous tendency for the ‘verbal,’ the ‘problem’ 
and the ‘memory’ types of tests to be separated, but 
the scattering and merging are such as to preclude any 
very definite conclusions concerning a group factor. A 
Similarly vague differentiation between ‘problem’ and 
‘verbal’ tests is found for the items in the analysis on 
experimental age 15. 

The factorial results based on age group 18 for 
the analysis of the items at the three superior adult lev- 
els are the most disconcerting so far as the possible 
presence of a sizable group factor is concerned, The 
residual variance (see Table 28) indicates that more than 
one factor is needed to explain the intercorrelations, and, 
as is seen from Table 42, the second centroid factor 
contributes an average of 10.7 per cent of the test va- 
riance. This second factor appears to involve the differ- 
ence between ‘verbal’ items and tests of immediate mem- 
ory such as repeating digits. In fact, the four tests 
which require the repeating of 8 or 9 digits are more 
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highly saturated with the second than with the first or 
more general factor. Even though it is fairly easy to 
pick out this conspicuous, though small, digit cluster, it 
is difficult if not impossible to assign definite meaning 
to the haphazard pattern formed by the remaining items 
in the analysis on the 18-year- group. 

The foregoing consideration of the outcome of ше 
several factor analyses leads to the definite conclusion 
that all the items in a particular analysis are saturated 
in varying degrees with a general factor, and to the ten- 
tative conclusion that one factor is sufficient to account 
for the intercorrelations except those based on experi- 
mental ages 2, 2-1/2, 6, 18, and possibly 7 and 11, The 
items at each of these levels apparently involve one ог 
more group factors, but in no case is the evidence suf- 
ficiently clear-cut to justify any elaborate deductions ге- 
garding the nature of these possible group factors. We 
have already Suggested provisional characterization for 
these item groupings — to say more would require analy- 
er samples, 
emarked, perhaps at this place, 
any two or more items which 
‘forms’ of the same ‘test situation 

50 far as their factorial] description 
is concerned, and tj 
items wil form a 


"Ch a small cluster to the 
status of a psychologically meaningfu] factor. One can 
be sure, however, that these 


у -age. Unfortu- 

nately, this latter statement cannot be made with regard 

to the ‘group factors’ which seem to emerge at age 2 and 
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2-1/2, at age 5 and 6, and at age 18. These ‘group 
factors,’ the existence and importance of which may be 
questioned, appear to be of sufficient prominence to cause 
Small, though not inconsequential, qualitative differences 
between the LQ.'s of two individuals when the mental 
maturity of either or both is at any one of the levels 
where these factors emerge, 

The absence of conspicuous group factors, though 
necessary, is not a sufficient condition either for the 
Seneralization that the I.Q.'s for individuals of differing 
maturity levels are Strictly comparable or for generaliza- 
tions regarding the relative constancy of LQ.'s. In either 
Case we must have the added condition that the same cen- 
tral function is being measured by the items at the va- 
rious maturity levels. Certain of the results of the factor 
analyses which we have already discussed tend to show 
that each item at a given level measures a factor or 
function which is common to all the items at that level, 
and we will now consider evidence, not conclusive though 
certainly more than presumptive, which indicates that 
the common or first factor at one level is the same as 
at other levels. 

The first approach to this problem is by way of 
the series of overlapping tests, i.e. the tests which are 
Common to any two adjacent analyses. The general fac- 
tor as found for the analysis on age group 2 is the fac- 
tor common to the items located at age levels II and П- 
6, while the general factor as found for experimental age 
2-1/2 is common to the items at levels II-6 and Ш. The 
items at age level II-6, which overlap or are included in 
these two analyses, wil! have two sets of factor loadings, 
One for the age 2 analysis and the other for the age 2-1/2 j 
analysis. In attempting to answer the question as to 
Whether or not the common factor at one age represents 
the same function as that found at the next higher age we 
are faced with four alternatives: factors the same or dif- 
ferent and the two sets of loadings in agreement or disa- 
Бгеетепі, Let us examine these four possibilities, 
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If the factors differ, it seems logical to assume 
that the two sets of loadings will also differ; and if the 
factors are the Same, it seems reasonable to assume 
that the loadings will agree. But it is the converse of 
these propositions which must concern us since the avail- 
able data yield information concerning the concurrence 
of the two sets of first factor loadings for the overlap- 
ping tests. If the loadings show disparities which are 
greater than are to be expected on the basis of sampling, 
it would seem safe to Suppose that the two factors are 
not identical. If the two sets of loadings correspond, 
within limits of the sampling errors, can it be said that 
the two common factors are homologous? This, it seems 
to the writer, can be answered in the affirmative by way 
of a negative: if the two factors really differed, the load- 
ings would not agree, 
If the above logic is sound, we can proceed to а 
study of the behavior of the first factor loadings for the 
several series of overlapping tests. Before doing this, 
it should be recalled that the standard error of a first 
centroid loading is, according to our empirical results, 
about the same ag that for a correlation coefficient of 
the same magnitude, А value of .60 may be taken aS 


representative of the first factor loadings. A tetrachoric 
r of this magnitude based on typical 35-65 cuts would 
have a standard erro 


.60 would have 
TS, while lower loadings would have 
larger errors. To be on the conservative side for such 
P сүзі i We draw, let us take smaller values, .10 
and .07 for N's of 100 and 200 respect oxi- 
mations for the sampling Ducem UE d 


errors of th ad- 
ings in this Study. Thu o rd 
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this time to compare the two sets of loadings for the 
tests which overlap two analyses, i.e. the k, values in 
the right-hand column with the first column of figures, 
In general the loadings for all 13 groups of overlapping 
tests show marked agreement, Of the 136 differences, 
91 are less than -10, and 120 are less than .20. There 
are, however, a few rather large, though not necessarily 
Statistically Significant, discrepancies which should re- 
ceive attention. We will discuss these in the order in 
which they occur, In Table 29 tests Г, П-6, 2, ‘identify- 
ing parts of body’ and M, 1-6, a, ‘stringing beads,’ and 
in Table 30 test Г, Ш, 3, ‘block building: bridge,’ yield 
differences which are about 1.5 times their standard er- 
rors. These differences may be real, but it is difficult 
to explain why presumably similar tests do not also show 
differences, 

For the tests which overlap the analyses at ages 
3 and 3-1/2 (see Table 31), there are three differences 
(for items L, Ш-6, a; L, Ш-6, 1; and 1, Ш-6, 5) which 
are possibly non-chance, and one difference (item M, 
П-6, a) which is definitely significant, Although the 
writer sees no a priori reason for these discrepancies, 
it cannot be argued that they are not real. But the fact 
that the alternate ‘form’ (M, III, 3) of test L, Ш-6, 5 
yields a loading of .641 (age 3 analysis) compared with 
+419 for L, Ш-6, 5 makes one skeptical as to the reality 
of a .27-point difference. The next possible non-chance 
difference occurs for test M, IV-6, 2, ‘definitions,’ with 
loadings of .649 (age 4 analysis) and .431 (age 4-1/2 
analysis), as can be seen in Table 33, An alternate 
‘form’ of this test occurs as L, V, 3 in the analyses at 
ages 4-1/2 and 5 with loadings of .427 and .703 respec- 
tively. If these were accompanied by shifts in other 
item loadings, it might be Suggested that the common 


adjacent analyses. A difference which is about twice its 
sigma will be noted in Table 36 for test L, VII, 2 
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‘similarities: 2 things.’ The fact that the other thirteen 
differences in this table are small and insignificant tends 
to overshadow this one difference. А statistically sig- 
nificant difference will be found in Table 37 for test М, 
УШ, 6, ‘opposite analogies,’ but here again the remain- 
ing differences are so small, averaging less than .05, 
that it seems illogical to suppose that the two common 
factors are different. The analyses at ages 9 and 11 (see 
Table 38) give one rather large, and likely non-chance, 
discrepancy — test M, X, 2, ‘memory for stories.’ 

Thus out of a total of 136 pairs of loadings for 
overlapping tests, only 12 show differences large enough 
to attract attention, and of these 12, only 3 seem to 
possess statistical Significance as judged by approximate, 
and very likely underestimates of the sampling errors. 
That so close an accord is found for two sets of factor 
loadings for overlapping variables is indeed surprising, 
especially when it is remembered that we have not only 
moved tests from one ‘battery’ or setup to another but 
have also made the analyses on different samples. These 
findings Certainly lead to the belief that the general fac- 
tors at two adjacent levels are identical or nearly so. 
As regards the centroid factors found for non-adjacent 
experimental ages, it would seem likely that they too are 
homologous, but one cannot here apply without reserva- 
atical principle that things equal to the 


Nevertheless, it 
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alternate ‘forms,’ within an analysis. Such tests or test 
Situations may appear at several different levels, and 
thus we are provided with additional overlaps which are 
not confined to adjacent analyses, For example, the two 
‘repeating digits’ items which are included in the analy- 
sis on age 2 overlap as identical tests in the analysis on 
age 2-1/2, which, in turn, includes two other ‘repeating 
digits’ tests, etc, Let us examine the behavior of the 
first factor loadings for such of these tests as occur (re- 
cur) in four or more analyses. Whenever within a given 
analysis a test situation occurs more than once we shall 
average its first factor loadings in order to have a more 
typical weight. 

The results of this approach to the problem of the 
equivalence of the general factors being measured at the 
various levels will be found in Table 43, in which an 
Omission means that the test situation did not occur in a 
particular analysis. The general tenor of this table is 
the marked consistency in loadings for a given test as 
we follow it from lower to higher levels, It seems to 
the writer that such -accord is not due to chance or ac- 
Cident, and that support is herewith found for the con- 
Clusion that the several common factors are homologous. 
There are, however, a few trends in Table 43 which force 
us to modify this conclusion, It will be noted that ‘re- 
peating digits’ and ‘picture vocabulary’ have higher load- 
ings at ages 2, 2-1/2, and 3 than at later ages, that ‘op- 
posite analogies’ has somewhat higher values at 3-1/2, 
4, and 4-1/2 than later, and that the loadings for ‘vo- 
Cabulary’ gradually rise from age 6 {о 18. These changes 
may indicate a change in the nature either of item situ- 
ations or of the general factor as we pass from lower 
to higher levels, or both. For instance, ‘repeating digits’ 
may at the early ages depend upon whatever ability is 
involved in attention and in following directions, whereas 
at later ages ıt may depend more upon immediate mem- 
Ory. As regards a possible change in the general fac- 
tor, it may be argued that the verbal element becomes- 
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more conspicuous at the higher levels and that the be- 
havior of the loadings for ‘vocabulary’ supports this pos- 
sibility. It seems quite logical to the writer to believe 
that some differences do exist in the common factor 
called for at various age levels, This is a tentative 
conclusion, and one which applies to changes in the na- 
ture of the test items rather than to changes in individ- 
uals as one passes from childhood to maturity. 

The results of the several analyses reported in 
this chapter may be conveniently summarized under three 
headings: (1) The several items at a particular maturity 
level are saturated, though in varying degrees, with а 
factor which is general or common to the tests located 
at the given level. When one considers the unreliability 
of single items the amount of variance due to the sev- 
eral first factors, one for each analysis, is reasonably 
Satisfactory. (2) The first factor for each analysis is 
sufficient to account for the intercorrelations except in 
the analyses based on experimental ages 2, 2-1/2, 6, and 
18. At these levels there seems to be some evidence 
that certain items are measuring. more than a single 
factor plus specifics, but our limited samples are insuf- 
ficient for definitely establishing the importance or па- 
ture of the possible group factors at these levels. (3) 
The first factor loadings for tests which overlap two 
adjacent analyses, and for tests and test situations which 


recur or run though several analyses, show a high de- 
gree of conformity, 


с This, despite some conflicting data, 
has been interpreted as Showing that the common factors 


The implicati 
fully qualified. 


аа Егочр fa - 
essary for the quantitative апа qu: абы Sees 
of LQ.'s for individuals of about the Sans oe rie on 
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FACTOR ANALYSES 


turity level, but the testing procedure is such that an 
individual’s failures and successes may spread over a 
greater range of tests than is included in our separate 
analyses, Thus if group factors exist within this larger 
range of items, small qualitative differences in LQ.'s 
will be possible. This point could be investigated, not 
by factorial methods, since larger ranges of items than 
we have used would involve extreme dichotomies of pass- 
ing or failing, but by an analysis of the patterns of items 
failed or passed by those individuals whose spread of 
performance is greatest (very large samples would be 
needed for such an analysis). A second implication, based 
on the conclusion that there is just one common factor at 
each level and also that the several general factors found 
at the various levels are identical or nearly so, is that 
the 1.0.75 for individuals of differing mental-maturity lev- 
els or for the same individual at different stages of de- 
velopment are comparable quantitatively and qualitatively. 
This should not be construed as an argument for LQ. 
Constancy, but rather as evidence that a condition neces- 
sary for LQ. constancy has been met, If it is found by 
a follow-up study that the I.Q.’s of six-year-olds corre- 
late highly with their LQ.'s as of age 2, the finding 
Would support our conclusion concerning homologous 
common factors. If, however, such a correlation were 
fairly low, our conclusion might be open to suspicion but 
not proved invalid, since other conditions, such as in- 
equalities in maturation rate or differences in experience 
or state of health, etc., may be responsible for the ap- 
parent lack of constancy. 
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TABLE 29 


FACTOR LOADINGS FOR ANALYSIS AT AGE 2 


Location 


L,U,a 
L,U,1 
L,I4 
1,1,5 
1,1,6 


1,П-6,а 
1,1-6,1 
1,1-6,2 
1,1-6,5 
1,1-6,6 


M,IL1 
M,IL2 
M,IL3 
M,ILS 


M,II-6,a 
M,II-6,1 
M,II-6,2 
M,II-6,5 
M,II-6,6 


Name of test 


Obey. simple commands 
3-hole form board 
Block building: tower 
Picture vocabulary 
Word combination 


Identify, obj. by name 
Identify. obj. by use 
Identify, parts of body 
Repeating 2 digits 
3-hole fm. bd, rotated 


Delayed response 
Identify, obj. by name 
Identify. parts of body 
Picture vocabulary 


Stringing beads 
Identify, obj. by use 
Motor coordination 
Repeating 2 digits 
Obey. simple commands 


£k?/n 
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k,(29 


TABLE 30 


FACTOR LOADINGS FOR ANALYSIS АТ АСЕ 2% 


Location 


1,1-6,1 
1,1-6,2 
1,1-6,3 
1,1-6,5 


L,Ul,a 
L, 1,2 
L,III,3 
L, 1,4 
Г, 01,5 
L,IIL6 


М,П-6,а 
M,II-6,2 
M,II-6,5 
M,II-6,6 


м,ш,2 
M,IIL3 
M,IILA 
M,IILS5 
м,ш,6 


Name of test 


Identify. obj. by use 
Identify. parts of body 
Naming objects 
Repeating 2 digits 


3-hole fm. bd, rotated 
Picture vocabulary 
Block building: bridge 
Picture memories 
Copying circle 
Repeating 3 digits 


Stringing beads 

Motor coordination 
Repeating 2 digits 
Obey. simple commands 


Picture vocabulary 
Identify. obj. by use 
Drawing vertical line 
Naming objects 
Repeating 3 digits 


Ek Yn 
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k (3) 


513 
816 
566 
.495 
.332 
.746 


122 
.641 
.557 
.663 
.661 


ТАВЬЕ 31 


FACTOR LOADINGS FOR ANALYSIS 
Location Name of test К, 
L,Illa 3-hole fm, bd. rotated .513 
L,III,1 Stringing beads .476 
L,II,2 Picture vocabulary .816 
L,IIL3 Block building: bridge .566 
1,ш,4 Picture memories 495 
1,Ш,5 Copying circle .332 
І,Ш,6 Repeating 3 digits 146 
L,III-6,a Drawing cross .636 
L,II-6,1 Obey. simple commands .584 
L,HI-6,3 Comparison of sticks .664 
1,Ш-6,4 Response to рісі, I «132 
1.,Ш-6,5 Identify. obj. by use .419 
1,1-6,6 Comprehension I .628 
M, 10,2 Picture vocabulary ‚122 
М,Ш,3 Identify, obj. by use .641 
М,Ш4 Drawing vertical line .557 
M,HLS Naming objects „663 
M,HL6 Repeating 3 digits .661 


М,Ш,6-а Matching objects 


21 

М,П1-6,1 Comparison of balls 0 
М,Ш-6,2 Patience: Pictures .653 
M,II-6,3 Discrim, animal pict, .603 
М,Ш-6,4 Response to pict, I .773 
М,Ш-6,5 Sorting buttons .590 
М,Ш-6,6 Comprehension 1 .637 

£k?/n 377 
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AT AGE 3 
ka k4 
-.199 -.124 
-.119  .156 
.234 .194 
.244 -.153 
-.263 .226 
.138 .235 
-.011 -.327 
-.467 .337 
1118 -.159 
-.257 -.129 
.129 -.309 
.387 .321 
-.043 -.331 
.254 .149 
4337 -.177 
-.700 -.264 
.298 .087 
.260 -.309 
-.416 .337 
-.264 .198 
-.134 -.127 
+124 1103 
.183  .141 
4159  .174 
.043 -,324 
.075 .054 


к, (32) 


.379 
778 
160 
.593 
.694 
.663 


.728 
.155 
.543 
.667 
.723 
.693 
.677 


FACTOR LOADINGS FOR ANALYSIS 


Location 


L,III-6,a 
1,1-6,1 
L, -6,2 
L,III-6,3 
L,HI-6,4 
L,III-6,5 
1,Ш-6,6 


ГУ, а 
L,IV,2 
L,IV,3 
L,IV,4 
L,IV,5 
L,IV,6 


М,Ш-6,а 
М,Ш-6,1 
М,Ш-6,2 
М,Ш-6,3 
М,Ш-6,4 
M,III-6,5 
M,III-6,6 


M,IV,1 
M,IV,2 
M,IV,3 
M,IV,4 
M,IV,5 
M,IV,6 


TABLE 32 


Name of test 


Drawing cross 

Obey. simple commands 
Picture vocabulary 
Comparison of sticks 
Response to pict, I 
Identify. obj. by use 
Comprehension I 


Memory for sentences I 
Naming obj. from memory 
Pict. completion: man 
Pictorial identification 
Discrimination of forms 
Comprehension II 


Matching objects 
Comparison of balls 
Patience: pictures 
Discrim, animal pict. 
Response to pict. I 
Sorting buttons 
Comprehension I 


Picture vocabulary 
Stringing beads 
Opposite analogies I 
Pictorial identification 
Number concept of two 
Memory for sentences I 


£k?/n 
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AT AGE 33 
ы d 
.296 .369 
.136 -.175 
.125 -.229 
-.230 .306 
-.018 .300 
-.152 -.174 
-.282 -.303 
-.343  .235 
.334 -.294 
-.219 sd 
-.289 -.227 
.318 -.136 
-.337 -.097 
.348 -.009 
.165 -.035 
.497 -.341 
.286 -.061 
.161 .292 
.308 -.088 
-.270 -.280 
-.270 -.351 
.260 .156 
-.347 .401 
-.308 -.245 
119.417 
-.272  .221 
.076 .063 


k, (4) 


.603 
.653 
.503 


.679 
.802 


.650 
.400 
154 
.546 
115 
‚121 


TABLE 33 


FACTOR LOADINGS FOR ANALYSIS 
Location Name of test к; 
L,IV,a Memory for sentences I .603 
L,IV,1 Picture vocabulary 201 
L,IV,2 Naming obj. from memory .653 
L,IV,3 Pict, completion:man .503 
L,IV,5 Discrimination of forms .679 
L,IV,6 Comprehension II .802 
L,IV-6,a Pictorial identification .728 
L,IV-6,1 Aesthetic comparison .602 
L,IV-6,2 Repeating 4 digits .435 
L,IV-6,3 Pictorial like, and diff, .470 


L,IV-6,4 Materials 


-660 

L,IV-6,5 Three commissions .445 
L,IV-6,6 Opposite analogies I .778 
M,IV,1 Picture vocabulary .650 
M,IV,2  Stringing beads .400 
M,IV,3 Opposite analogies I 4154 
M,IV,4 Pictorial identification .546 
M,IV,5 Number concept of two .715 
М,1У,6 Memory for sentences I +727 
M,IV-6,a Patience: Pictures .565 
M,IV-6,1 Discrim, animal pict, -501 
M,IV-6,2 Definitions .649 
M,IV-6,3 Repeating 4 digits .697 
M,IV-6,4 Picture completion: bird ‚611 
M,IV-6,5 Materials .174 
M,IV-6,6 Comprehension п ‚151 
кўп 411 
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АТ АСЕ 4 
к; k3 
-.249  .212 
.145  .206 
.291 .035 
.426 .235 
.154 -.090 
-.192 -.226 
.229 -.188 
.137 -.173 
-.291  .419 
-.211  .124 
.281 -.281 
-.175 -.201 
-.190 -.207 
.079 .137 
.201 -.650 
-.076 -.245 
.084 .203 
.315 -.226 
.089 .330 
-.363 .291 
.273 .188 
-.850 -.109 
-.424 .300 
.282 .266 
-.208 -.199 
-.213 -.193 
.061 .065 


k, (42) 


.718 
.623 
.533 
.578 
.692 
.409 
.199 


FACTOR LOADINGS FOR ANALYSIS 


Location 


L,IV-6,a 
L,IV-6,1 
L,IV-6,2 
L,IV-6,3 
L,IV-6,4 
L,IV-6,5 
L,IV-6,6 


L,V,a 
L,V,1 
L,V,2 
L,V,3 
L,V,4 
L,V,5 
L,V,6 


M,IV-6,a 
M,IV-6,1 
M,IV-6,2 
M,IV-6,3 
M,IV-6,4 
M,IV-6,5 


M,V,1 
M,V,2 
M,V,3 
M,V,4 
M,V,5 
M,V,6 


TABLE 34 


Name of test 


Pictorial identification 
Aesthetic comparison 
Repeating 4 digits 
Pictoriallike and diff. 
Materials 

Three commissions 
Opposite analogies I 


Knot 

Picture completion:man 
Paper folding:triangle 
Definitions 

Copying square 

Memory for sentences II 
Counting 4 objects 


Patience: pictures 
Discrim. animal pict. 
Definitions 

Repeating 4 digits 
Picture completion:bird 
Materials 


Picture vocabulary 
Number concept of three 
Pictorial sim. and diff. 
Patience:rectangles 
Comprehension II 
Mutilated pictures 


£k?/n 
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АТ АСЕ 4% 
к; K3 
«ДАЛ, .173 
-.213 -.158 
-.148 -.307 
-.051  .441 
-.107 .230 
-.384 -.499 
.070 .415 
.170  .320 
.542 -.115 
.415 -.361 
-.286 .411 
.147 -.384 
.038 .034 
.053 -.015 
-.142 -.129 
.068 -.174 
-.601  .284 
-.451 -.349 
.146 -.163 
.300  .279 
.013 -.104 
.384 .140 
-.054 -.043 
-.200 -.129 
-.109  .203 
.331 -.096 
.072 .071 


k,(5) 


547 
431 
.536 
.703 
.509 
.569 
.651 


.670 
.480 
.610 
.470 
.630 
.613 


TABLE 35 


FACTOR LOADINGS FOR ANALYSIS AT AGE 5 


Location 


L,V,a 
L,V,i 
L.V.2 
L,V,3 
L,V,4 
L,V,5 
L,V,6 


L,VL2 
L,VL3 
L,VLA4 
L,VL5 
L,VL6 


M,V,1 
M,V,2 
M,V,3 
M,V,4 
M,V,5 
M,V,6 


M,VI,1 
M,VL2 
M,VL3 
М,у1,4 
M,VLS 
М,У1,6 


Name of test к, 
Knot .547 
Picture completion:man 431 
Paper folding:triangle .536 
Definitions 703 
Copying square .509 
Memory for sentences II .569 
Counting 4 objects 651 
Copying bead chain mem., I ‚709 
Mutilated pictures .564 
Number concepts .667 
Pictorial like, and diff, .610 
Maze tracing +482 
Picture vocabulary .670 
Number Concept of three .480 
Pictorial sim, and diff, -610 
Patience:rectangles .470 
Comprehension II .630 
Mutilated pictures .613 
Number concepts .663 
Copying bead chain -763 
Differences .659 
Response to Pictures 1 525 
Counting 13 Pennies 671 
Opposite analogies I .619 

£k?/n .365 
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k,(6) 


.652 
.406 
.770 
.726 
.573 


.667 
.495 
.498 
.511 
.528 
.708 


. —— ————— ———-————-— 


ТАВЬЕ 36 


FACTOR LOADINGS FOR ANALYSIS АТ AGE 6 


Location 


L,VLi 
L,VL2 
L,V13 
L,VL4 
L,V1,5 
L,VL6 


L,VILi 
L,VIL2 
L,VIL3 
L,VILA 
L,VIL5 
L,VIL6 


M,VI,1 
M,VI,2 
M,VL3 
M,VI,4 
M,VI,5 
M,VL6 


M,VII,1 
M,VIL2 
M,VIL3 
M,VILA 
M,VIL5 
M,VIL6 


Name of test 


Vocabulary 

Copying bead chain mem. I 
Mutilated pictures 
Number concepts 
Pictorial like, and diff. 
Maze tracing 


Picture absurdities I 
Similarities: 2 things 
Copying diamond 
Comprehension III 
Opposite analogies I 
Repeating 5 digits 


Number concepts 
Copying bead chain 
Differences 

Response to pictures I 
Counting 13 pennies 
Opposite analogies I 


Giving no. of fingers 
Memory for sentences II 
Picture absurdities I 
Repeat 3 digits reversed 
Sentence building I 
Counting taps 


£k Yn 


131 


к1(7) 


.476 
739 
.625 
.682 
.654 
.565 


.586 
.636 
.684 
101 
.674 
.493 


FACTOR LOADINGS FOR ANALYSIS 


Location 


L,VIL1 
L,VIL2 
L,VIL3 
L,VILA 
L,VILS5 
L,VIL,6 


БУШ 
L,VIIL2 
L,VIIL3 
L,VIILA 
L,VIILS5 
1,УШ,6 


М,УП,1 
M,VIL2 
M,VIL3 
M,VIL4 
M,VILS5 
M,VIL6 


М,УШ,1 
М,УШ,2 
M,VIIL3 
M,VIILA 
М,уш,5 
М,УШ,6 


TABLE 37 


Name of test 


Picture absurdities I 
Similarities: 2 things 
Copying diamond 
Comprehension Ш 
Opposite analogies I 
Repeating 5 digits 


Vocabulary 

Memory for stories 
Verbal absurdities I 
Similarities and diff, 
Comprehension IV 
Memory for sentences ш 


Giving по. of fingers 
Memory for Sentences II 
Picture absurdities I 
Repeat 3 digits reversed 
Sentence building I 
Counting taps 


Comprehension III 
Similarities: 2 things 
Verbal absurdities I 
Naming days of week 
Problem situations 
Opposite analogies II 


£k7/n 
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.438 


AT AGE 7 
ко kg 
-.233  .125 
-.463 .283 
-.079 .133 
-.172 -.076 
173 .067 
.164 -.253 
4272 1177 
.168 1191 
-.140 -.043 
-.313  .227 
.256 .170 
-.103 -.373 
-.328 -.264 
.025 -.190 
.169 .138 
.364 -.241 
.162 -.412 
-.007 -.329 
.133 .077 
-.233 ,344 
.255 .100 
-.266 -.337 
-.109  .302 
.357 .182 
054 .055 


К 1(9) 


.682 
142 
.808 
101 
.556 
.444 


FACTOR LOADINGS FOR ANALYSIS 


Location 


L,VIIL2 
L,VIIL3 
L,VIILA 
L,VIIL5 
L,VIIL6 


1х1 
L,IX,2 
L,IX,3 
L,Ix,4 
L,IX,5 
L,IX,6 


LXI 
L,X,2 
L,X,3 
L,X,4 
L,X,5 
L,X,6 


M, VIII, 1 
м,уш,2 
м,уш,3 
м,уш,4 
М,УШ,5 
М,УШ,6 


M,IX,1 
M,IX,2 
M,IX,3 
M,IX,4 
M,IX,5 
M,IX,6 


M,X,1 
M,X,2 
M,X,3 
M,X,4 
M,X,5 
M,X,6 


TABLE 38 


Name of test 


Memory for stories 
Verbal absurdities I 
Similarities and diff. 
Comprehension IV 
Memory for sentences Ш 


Paper cutting 1 

Verbal absurdities II 
Memory for designs 
Rhymes:new form 
Making change 

Repeat 4 digits reversed 


Vocabulary 

Picture absurdities II 
Reading and report 
Finding reasons I 
Word naming 
Repeating 6 digits 


Comprehension Ш 
Similarities: 2 things 
Verbal absurdities I 
Naming days of week 
Problem situations 
Opposite analogies II 


Memory for designs 1 
Dissected sentences I 
Verbal absurdities II 
Similarities and diff. 
Rhymes: old form 
Repeat 4 digits reversed 


Block counting 
Memory for stories I 
Verbal absurdities III 
Abstract words I 
Word naming: animals 
Repeating 6 digits 


£ k ?/n 


—" 


AT AGE 9 
к. Ky 
-.189  .312 
.115 .012 
115 -.178 
.356 1172 
-.098 -.192 
A79 1181 
1188 .212 
-.139 .347 
-.198 -.124 
-.307 -.120 
-.324 -.164 
1184  .170 
.232 1122 
-.220 .107 
.219 -.025 
.062 .001 
-.524 -.306 
146 -.219 
-.181 -.107 
1180 -.176 
-.322 .118 
.213 — 243 
.505 -.104 
-180  .313 
-.195 -.076 
.262 .100 
119 =.228 
102 -.242 
-.268 -.120 
.213 .145 
-.019 111 
163  .110 
052° .037 
.254 -.332 
-.463 -.249 
059 .036 


к, (11) 


.269 
.735 
.723 


497 
456 


TABLE 39 


FACTOR LOADINGS FOR ANALYSIS АТ AGE 11 


13) 
Location Name of test k; к ks ki 
L,X,2 Picture absurdities II .428 .100 -.512 

L,X,3 Reading and report 643 .157 .194 

L,X,4 Finding reasons I .409 -.263 .003 

L,X,5 Word naming .480  -.168 .296 

L,X,6 Repeating 6 digits .499 -.467  .190 

L,XLl Memory for designs .325 -.045 -.322 

L,XL2 Verbal absurdities Ш 698 .342 -.181 

L,XL3 Abstract words I .868 .240  .195 


L,XL4 Memory for sentences IV 1579 -.217  .189 
L,XL5 Problem situation 


L,X1,6 Similarities: 3 things 


L,XH,1 Vocabulary 
L,XII,2 Verbal absurdities II 
L,XIL3 Response to pict. II .683 


L,XIL4 Repeat 5 digits reversed 452 ..425 257 .449 
L,XILS Abstract words II 


1758  .278 .225 ғ 
L,XIL6 Minkus completion 610 -.113  .062 .585 
M,X,1 Block counting -269 .062 -.252 
M,X,2 Memory for stories I 4135 :160 4111 
M,X,3 Verbal absurdities III (123  .291 -.114 
M,X,5 


Word naming: animals .497 -.118 -.003 
M,X,6 Repeating 6 digits 


M,XI,1 Finding reasons 
M,XL2 Copying bead chain mem, 348 -.239 -.305 
М,Х1,3 Verbal absurdities п 


М,ХП,1 Memory for designs II 


697 -.210 -.406 611 
M,XIL3  Minkus completion .630  .119  .129 .422 
M,XIL4 Abstract words 1 .827 246 1119 .800 
M,XII,5 Picture absurdities II 619 -.027 ..249 .702 
M,XIL6 Repeat 5 digits reversed 


-595 -.404 186 .371 
£k Yn 
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TABLE 40 


FACTOR LOADINGS FOR ANALYSIS AT AGE 13 


Location Name of test ki ko кә 15) 
L,XII,2 Verbal absurdities II .105 .198 .013 
L,X0,3 Response to pict. П 1576 .273 .125 
L,XIL4 Repeat 5 digits reversed .449 -.270 -.351 
L,XI,6 Minkus completion 1585  .310 -.190 
1,ХШ,1 Plan of search А 1555 -.250 .337 
L,XIIL2 Memory for words (677 -.185 -.066 
L,XII,3 Paper cutting I 1512 -.209 -.129 
L,XIII,4 Problems of fact .390 .161 -.164 
L,XII5 Dissected sentences 1666 .344 -.136 
L,XHI6 Copying bead chain mem. II .465 -.229 -.259 
L,XIV,1 Vocabulary .850 .250  .306 2 
L,XIV,2 Induction (641 -.252 -.025 .647 
L,XIV,3 Picture absurdities Ш 1674 -.246 .345 .612 
L,XIV,4 Ingenuity 658 -.256 -.045 113 
L,XIV,5 Orientation: direction I 662 -.239 -.258 .576 
L,XIV,6 Abstract words II .827  .344  .178 817 
M,XII,1 Memory for designs П .611 -.249 .130 
M,XIL3 Minkus completion .422 .257 -.254 
M,XIL4 Abstract words I .800 .400  .213 
M,XII,5 Picture absurdities II 1002 -.236 .222 
M,XIL6 Repeat 5 digits reversed 1371  .144 -.265 
M,XIII,1 Plan of search .533 -.273 .357 
M,XII,2 Memory for stories II .405 .140 .089 
M,XII,4 Abstract words II .638 .082 .158 
М,ХШ,6 Memory for sentences IV 1522 -.004 -.171 
M,XIV,1 Reasoning 1501  .105 -.121 .673 
M,XIV,3 Orientation: direction I .617 -.169 -.213 - 
M,XIV,4 Abstract words III .857 .223 .171 " 
M,XIV,5 Ingenuity .553 -.296 .092 - 
M,XIV,6 Reconciliation opposites .452 .100 .119 „525 
£k 7/n .383 .057 .048 
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FACTOR LOADINGS FOR ANALYSIS 


Location 


L,XIV,2 
L,XIV,3 
L,XIV,4 
L,XIV,5 
L,XIV,6 


L,AA,1 
L,AA,2 
L,AA,3 
L,AA,4 
L,AA,5 
L,AA,7 
L,AA,8 


L,SAL2 
L,SAI,3 
L,SAI,4 
L,SAI,5 
L,SAI,6 


M,XIV,1 
M,XIV,6 


M,AA,1 
M,AA,2 
M,AA,3 
М,АА,4 
М,АА,5 
M,AA,6 
M,AA,7 
M,AA,8 


M,SAI,1 
M,SAI,2 
M,SAI,5 


TABLE 41 


Name of test 


Induction 

Picture absurdities Ш 
Ingenuity 

Orientation: direction I 
Abstract words П 


Vocabulary 

Codes 

Diff. abstract words 
Arithmetical reasoning 
Proverbs I 


Memory for sentences V 
Reconciliation Opposites 


Enclosed box problem 
Minkus completion 
Repeat 6 digits reversed 
Sentence building 
Essential Similarities 


Reasoning 
Reconciliation Opposites 


Abstract words Ш 
Ingenuity 

Opposite analogies Ш 
Codes I 

Proverbs I 
Orientation: direction I 
Essential differences 
Binet paper cutting 


Minkus completion 


Opposite analogies IV 
Sentence building II 


Ék?/n 


AT AGE 15 
К2 Ез 
-.352 .162 
-.160 -.202 
-.231 .044 
-.419 -.105 
-.070 -.141 
.315 .304 
-.079 -.214 
1128 -.072 
-.130 -.119 
.352 -.347 
.099 .067 
-.111 -.166 
-.255 .288 
.308 -.067 
-.144 ,430 
.276 .418 
.166 -.244 
-.176 .365 
-.276 -.167 
-106 .213 
-.391 -.057 
317 .112 
190 -.078 
.457 -.196 
-.477 -.220 
-153 -.148 
-.176 -.243 
127. -.187 
333 ‚отт 
152 119 
067 (046 


к1(18) 


517 
103 
.592 
.657 
.595 


.575 
.589 
.806 


TABLE 42 


FACTOR LOADINGS FOR ANALYSIS АТ AGE 18 


Location 


L,SAI,1 
L,SAI,2 
L,SAI,3 
L,SAI,4 
L,SAI,5 
L,SAI,6 


L,SAIL2 
L,SAII,3 
L,SAII,4 
L,SAII,5 
L,SAIL,6 


L,SAIIL2 
L,SAIIL3 
L,SAIII,4 
І,5АШ,5 
L,SAIII,6 


MSAI,1 
M,SALZ 
M,SALS 
M,SAI,6 


M,SAIL1 
M,SAIL 2 
M,SAIL3 
M,SAII,4 
М,ЅАП,5 


М,ЗАШ,1 
M,SAIII,2 
M,SAIII,3 
М,5АШ,4 
M,SAIII,6 


Name of test 


Vocabulary 

Enclosed box problem 
Minkus completion 
Repeat 6 digits reversed 
Sentence building 
Essential similarities 


Finding reasons II 
Repeating 8 digits 
Proverbs II 
Reconciliation opposites 
Repeat tho’t passage 


Orientation: direction II 
Opposite analogies II 
Paper cutting II 
Reasoning 

Repeating 9 digits 


Minkus completion 
Opposite analogies IV 
Sentence building II 
Reconciliation opposites 


Proverbs II 
Ingenuity 

Essential differences 
Repeating 8 digits 
Codes II 


Proverbs III 

Memory for sentences V 
Orientation: direction II 
Repeating 9 digits 
Repeat tho't passage II 
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Chapter Х 
SPECIAL SCALES 


It is the purpose of this chapter to describe, and 
set forth data on, three special scales: vocabulary as a 
very abbreviated measure of intelligence, a series of 
items as a non-verbal scale of intelligence, and a scale 
for ‘immediate memory.’ The reason for presenting data 
On vocabulary should be obvious: there are times when 
an individually administered, quickly and easily determin- 
able, rough measure of intelligence is needed, The his- 
tory of intelligence testing provides ample reasons for 
believing that a suggested non-verbal scale with adequate 
norms is highly desirable. The frequent use by clinicians 
of ‘memory’ items as evidence for or against possible 
memory deterioration suggests the desirability of two 
things: first, a critical examination of the meaning and 
dependability of such ‘memory’ measures; and second, 
the presentation of norms so that those who by choice or 
necessity insist on so gauging memory will at least have 
an adequate basis for interpreting an obtained score, 


Vocabulary 


It will be recalled that the vocabulary test con- 
Sists of 45 words carefully chosen and arranged for dif- 
ficulty, Asa single test it has all the advantages claimed 
for an individually administered test, and therefore may 
be preferred to the typical synonym subtest of group 
Scales, Scored by the standards for passing at a given 
C.A. level, vocabulary tends to yield the highest biserial 
r's with total score, and scored in terms of number of 
words passed it yields product-moment correlations with 
composite M.A.’s of .71, .83, .86, and .83 for ages 8, 11, 


139 


SPECIAL SCALES 


14, and 18 respectively (N’s of 200 plus at 8, 11, and 14; 
101 at 18). These correlations are in part spurious be- 
cause the vocabulary test is included in M.A. determin- 
ation, but the degree of spuriousness is not serious since 
the vocabulary test represents less than 5 per cent of 
the total number of items entering into M.A. scores. The 
magnitude of these correlations indicates that the vocab- 
ulary test alone constitutes a good rough measure of in- 
telligence, We have no reliability coefficient for the 
vocabulary test, but the size of its ‘validity’ coefficients, 
given above, is such that one need not worry much about 
reliability, 

In Table 44 will be found normative data for the 
vocabulary test scored in terms of the number of words 
passed. The N for each age 15 to 18 is 100 plus, and 
for ages 7 to 14 it is 200 plus with the exception of age 


7, where it is only 194 owing to the fact that the test 
was omitted for 8 cases becaus 


e of improper location in 
the provisional form, 


The performance of 4 of these 8 
subjects was such that one can be confident that their 
Scores would have been zero, Accordingly, the computed 
mean, 7.15, for age 7 has been corrected downward to 
the 7.0 reported in the table, The needed adjustment to 
the standard deviation was not made, 

It will be noted that the ‘growth’ curve for vo- 
cabulary as here measured Shows slight negative acceler- 
ation, and that the Variabihty increases with age. А high 


score of 38 was attained by two Subjects, one at age 17, 
the other at 18, 


use of this brief vo- 
cabulary test as a measure of intellect, although there 


may be circumstances when it would Constitute a far bet- 
ter indicator tham no test at all, 
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TABLE 44 


NORMS: MEANS AND STANDARD DEVIATIONS FOR 
VOCABULARY TEST, SCORED AS NUMBER 
OF WORDS PASSED 


Age Mean S.D. 
7 7.0 2.0 
8 8.3 2.2 
9 10.0 2.7 

10 11.4 3.1 
11 13.7 3.9 
12 15.4 4,7 
13 17.4 5.1 
14 18.2 4.9 
15 20.0 5.2 
16 21.2 5.4 
17 22.0 4.8 
18 22.5 5.6 


A Tentative Non-Verbal Scale 


у When assembling the items for the New Revision, 
it Was hoped that enough non-verbal material could be 
Included to permit the construction of a non-verbal form 
Which would parallel two verbal forms as regards diffi- 
culty, reliability, and validity, Despite the large number 
and diversity of the items utilized in the preliminary 
work and later in the two provisional forms, it was not 
Possible to realize this goal. It was thought desirable, 
however, to include as much of the non-verbal material, 
especially at the lower end, in the final forms as seemed 
to satisfy the requirements laid down for the retention of 
items, The presence of this more or less non-verbal 
material in the final form has led іс the question as to 
what might be expected of these retained items as а ѕер- 
arate scale. Accordingly, we here present a brief sta- 
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tistical analysis of two non-verbal forms, somewhat bal- 
anced for content, difficulty, and validity, The 40 items 
utilized were selected by Dr. Merrill as being non-verbal 
or at least as being less verbal than the other tests 
contained in Forms L and М. Since the directions for 
these items are mainly verbal rather than pantomime, it 
follows that some understanding of language is involved 


and consequently that the items are not to be regarded 
as purely non-verbal, 


The 20 items for Form I and for Form П are 
listed in Tables 45 and 46 by location in Forms L and M 
and by name, It will be noted that those in Form I are 
predominantly from Form L, and that 14 of the 20 tests 
of each form are located at level VI or lower, with only 
8 items scattered from level VII to the adult levels, Al- 
though we have analyzed the results for each age from 2 
to 18, the fewness of items beyond age level VI does not 
warrant any detailed presentation of data for ages be- 
yond 8. The essential data regarding age means, Sig- 
mas, form versus form reliabilities, and correlations of 
each form with М.А. based on a composite of Forms L 
and M are presented in Table 47. Because of the re- 
Stricted range in non-verbal Scores, all correlations in 
this table are tetrachorics, Their standard errors will 
be about .10 to .12 for ages 2 to 5-1/2, and about .07 to 
.08 for ages 6, 7, and 8, 

Reference to Table 47 shows that the scales cor- 
relate moderately with mental age, but these correlations 


are spuriously nigh because the non-verbal tests are in- 
volved in mental ages. If this Spurious element were 
eliminated, one might expect the Correlation to average 
about .65 instead of about ,70, The reliabilities average 
near .65. One must refrain from correcting for attenu- 


ation the correlations between M.A, and 
Scales because the measurement errors wl] surely be 
correlated, It would appear that a scale of so fow items 
yields reliabilities of insufficient size to warrant recom- 
mending the use of a single form; both forms combined 
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TABLE 45 


ITEMS INCLUDED IN TENTATIVE FORM I 


Lll 
1,1,4 
1,1-6,6 
ышы 
L,IILS 
L,III-6, alt, 
M,III-6, alt, 
L,IV,3 
L,IV,5 
M,IV, alt, 
L,V,1 
L,V,2 
L,V,4 
L,VL2 
L,VILS 
LjIX,1 
L,IX,3 
L,XL1 
L,XIIL6 
L,S.A.III,4 


OF NON-VERBAL SCALE 


Three-hole form board 

Block building: tower 
Three-hole form board: rotated 
Stringing beads 

Copying a circle 

Drawing а cross 

Matching objects 

Picture completion: man 
Discrimination of forms 
Discrimination of animal pictures 
picture completion: man 

Paper folding: triangle 

Copying а square 

Copying a bead chain from memory I 
Copying а diamond 

Paper cutting 1 

Memory for designs 


Memory for designs 


Copying a bead chain from memory Ц 


Paper cutting II 
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TABLE 46 


ITEMS INCLUDED IN TENTATIVE FORM II 
OF NON-VERBAL SCALE 


M,II,1 Delayed response 

M,II-6,2 Motor coordination 

M,II-6, alt, Stringing beads 

M,III,4 Drawing a vertical line 

L,III,3 Block building: bridge 

L,I, alt, Three-hole form board: rotated 
M,III-6,3 Discrimination of animal pictures 
M,III-6,5 Sorting buttons 

M,IV,2 Stringing beads 

M,IV-6,1 Discrimination of animal pictures 
M,IV-6,4 Picture completion: bird 

M,V,4 Patience: rectangles 

L,V, alt, Knot 

М,У1,2 Copying а bead chain 

M,IX,1 Memory for designs 1 

M,X,1 Block counting 

M,XI,2 Copying a bead chain from memory 
M,XII,1 Memory for designs II 

L,XIII,3 Paper Cutting І 

M,A.A.,8 


Binet paper Cutting 
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TABLE 47 


DATA ON NON-VERBAL SCALES 


Composite 
or sum of 
Form II Correlations Forms I & II 
Туз. П vs. I vs. 
М 5.0. М.А. М.А. п м 5.0. 
1.9 1.2 .53 „60 .53 4.0 2.1 
4.4 1.9 ‚14 .64 .15 8.1 3.4 
6.3 2.1 stl .84 .56 11,9 3.6 
8,6 1,9 .71 .64 .76 16.6 3.8 
9.9 1.9 .77 .57 .61 19,2 3.8 
11.2 1.8 .63 .76 .53 22.5 3.4 
12,4 1.5 .53 .66 .19 24,6 3.2 
13.1 1.1 .67 .12 71 26.1 2.3 
13.7 1.0 -T7 .61 .76 27.4 2.3 
14.3 1.0 .70 .58 .52 29,2 2.2 
14.9 1.2 .60 .70 .55 30,7 2.1 
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would have a reliability in the vicinity of .79, which is 
also unsatisfactory, 

The unreported data for age groups 9 to 18 show 
still lower reliabilities, and also lower correlations for 
the non-verbal versus mental ages, This, of course, 1S 
not unexpected since there are still fewer non-verbal 
items at these levels, The distributions become more 
and more skewed for lack of top. It might be of interest 
to note here that the 40 items chosen as non-verbal tend 
to have first factor loadings which average .50 as com- 
pared to about .60 to .65 for all the items of the New 
Revision. The fact of variation among the 40 items as 
regards their general factor saturation suggested that a 
system of weights based upon magnitude of first factor 
loadings might improve the correlations between the non- 
verbal scales and mental age. Actual tryout at ages 4, 
5, 10, and 14 gave evidence that such a basis for weight- 
ing would not increase the correlations, 


Memory 


From each of Forms L and M, 22 items, well 
scattered throughout the age levels, were chosen оп ап а 
priori basis as items which could be said to measure 
‘memory,’ or more precisely ‘immediate memory.’ The 
tests or items so selected are listed in Tables 48 and 49. 
Two memory scores were determined for all the individ- 
uals in the standardization group from ages 2 to 18. The 
two sets of memory scores were correlated with each 
other and each was correlated with composite mental 


age. These correlations were computed as tetrachorics 
because of the limited ran, 


Means and standard deviati 
for scores obtained b 
These statistics, 


little discussion, 


ge of the memory scores. 
ons for the two scales, and 
У combining the two, were calculated. 
which are presented in Table 50, need 


The reliabilities tend to average .70, which when 
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TABLE 48 
MEMORY SCALE. FORM I (L) 


II-6,5 Repeating 2 digits 

ІП,4 Picture memories 

III,6 Repeating 3 digits 

IV,2 Naming objects from memory 
IV-6,2 Repeating 4 digits 

IV-6,5 Three commissions 

V,5 Memory for sentences II 
У1,2 Copying а bead chain from memory I 
VII,6 Repeating 5 digits 

УШ,2 Memory for stories 

VIII,6 "Memory for sentences III 
IX,3 Memory for designs 

IX,6 Repeating 4 digits reversed 
X,3 Reading and report 

X,6 Repeating 6 digits 

XILI,4 Repeating 5 digits reversed 
XIII,2 Memory for words 

XIII,6 Copying а bead chain from memory П 
А.А.,7 Memory for sentences У 
S.A. II, 3 Repeating 8 digits 

S.A. II, 6 Repeating thought of passage 
5.А.Ш,6 Repeating 9 digits 
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TABLE 49 
MEMORY SCALE, FORM II (M) 


пл Delayed response 

П-6,5 Repeating 2 digits 

ш,6 Repeating 3 digits 

IV,6 Memory for sentences I 
IV-6,3 Repeating 4 digits 

VII,2 Memory for sentences II 
VII,4 Repeating 3 digits reversed 
IX,1 Memory for designs I 

IX,6 Repeating 4 digits reversed 
X,2 Memory for stories I 

X,6 Repeating 6 digits 

XL2 Copying a bead chain from memory 
XL6 Memory {ог sentences III 
хп,1 Memory for designs П 
XII,6 Repeating 5 digits reversed 
XIIL2 Memory for stories II 
XIIL,6 Memory for sentences IV 
S.A. 1,4 Repeating 6 digits reversed 
S.A. П,4 Repeating 8 digits 

S.A. Ш,2 Memory for sentences V 
S.A, III,4 Repeating 9 digits 

S.A. III,6 


Repeating thought of passage II 
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ТАВЬЕ 50 


DATA ON MEMORY SCALES 
Composite 


or sum of 
Form I (L) Form II (М) Correlations Forms I & II 


І ув. П vs. I vs. 
Age M 5.0. M S.D. М.А. М.А. П м S.D. 


2 8 9 1.3 8 „79 .75 .66 2.1 1,5 
23 2.2 14 2.2 12 ат 42 2048 44 2.6 
з 28 14 29 13 .60 62 .59 57 24 
32 45 16 39 1.0 .83 .56 .69 8.4 24 


4 5.0 1.6 4.1 .9 83 .57 .68 91 23 
4% 6.2 17 4.6 .8  .83  .83 .78 10.8 2.4 
5 6.9 1.7 49 .9 .56 .58  .77 11.8 2.5 
55 1.6 17 52 L0 т 16 .55 128 24 
6 82 1*7 5.7 12 .67 .92 .65 13.9 2.6 
7 100 21 а 27 лә 40 Л4 ма 3,5 
8 14 21 8.7 22 .78 .66 .71 20,1 4.0 


UD .75 .80 23.4 4,6 
.86 .68 26.3 4.4 
.83 .78 28.4 5.0 
.68 30.2 5.0 


9 13.0 2.3 10.4 2.5 
10 143 2.2 12.0 2.6 .82 
iL 154 23 191 3.0 .75 
12 160 2.5 142 2.9 .68 .84 


13 169 22 152 24 Л6 00 «74 321 42 
14 17.0 2.0 15.5 2.3 .82  .83 .82 32.5 4,1 
15 115 21 161 28 .90 .88 .69 336 4.5 
16 119 22 164 2.6 .74 т .80 343 46 


17 182 19 168 1.9 Л: 056 .54 350 3.3 


18 18.3 2.0 16.9 2.2 .86 .75 .70 35.2 3.9 
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stepped up would indicate a reliability of about .82 for 
scores based on the items in both scales. Since clini- 
cians and other workers seldom administer more than 
one form, the memory ability which they infer from the 
‘memory’ items will possess a reliability of only .70, 
which is too low for individual diagnosis. 

To argue that such an unreliable measure of 
memory is better than none at all overlooks another 
pertinent fact: the memory scores correlate just about 
as high with mental age as the reliabilities permit, An 
exact determination of the correlation to be expected be- 
tween perfectly measured mental age and memory (as 
here defined operationally) is complicated by the spurious 
nature of the obtained correlations between memory and 
mental age and by the likelihood of correlated errors, 
Any reasonable allowance for these effects will lead to 
the conclusion that ‘memory’ as determined by the items 
of a ‘memory’ nature in the New Revision is not very 
different from the general intelligence being measured by 
the scale as a whole, The first factor loadings for the 
memory items average about .05 lower than the average 
for all the items; thus the memory items are not quite 
so highly saturated with the central function being meas- 
ured, It would appear, therefore, that those clinicians 
who continue to have faith in the utility of certain Binet 
items as a measure of memory or ‘immediate’ memory 
may find but little to Support their position, 

Of course, this whole issue is a problem in a 
broader field of Study, namely, the organization of abili- 
ties. Our own factor analyses (see Chapter IX) are too 
limited to throw much light on ‘memory’ as a factor. 
Items which logically seem to call for some sort of mem- 
огу do not have similar factorial patterns for the three 
factors extracted, It is true, however, that the repeat- 
ing-of-digits tests do possess similar loadings, but 
whether the fact indicates more than a Specific repeat- 
ing-of-digits factor is questionable, Perhaps a thorough- 
ly intensive and extensive factor analysis of memory, 
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based upon a large sample or samples, is needed, But 
such an analysis might not provide an answer to those 
who claim that intelligence tests of the Binet type are 
merely measures of memory. The argument assumes 
that two individuals scoring different mental ages differ 
primarily as regards memory abiity, 1.е. power of re- 
tention and reproduction. Such a concept would rule out 
individual differences in the abihty to observe, see re- 
lationship, or profit by experience. After all, observa- 
tion and learning must precede retentivity; even 1n the 
case of repeating digits we have an example of single- 
trial learning. The final answer as to whether variance 
in measured intelligence is more dependent upon reten- 
tivity than upon original learning, or as to the extent of 
each as a contributor, must be sought іп the laboratory. 
We hazard the guess that securing the answer will in- 
volve experimentation rather than wholesale correlation- 


al analysis, 


Summary 


The materials of this chapter have been presented 
e the limitations of special scales 
appropriate items from Forms L 
y test alone yields a fairly ade- 
nce measured by the 


in order to emphasiz 
made up by selecting 
and M, The vocabular 
quate measure of the kind of intellige 
New Revision, The dearth of non-verbal material is such 


that little reliance can be placed upon a score based 
solely upon the non-verbal items, Regarding ‘memory’ 
as inferred from items which apparently involve memory 
or immediate memory, we have called attention to the 
low reliability of such scores and have questioned the 
logic of assuming that memory, as а function, is really 


the trait tapped by these items. 
In view of the questionable validity of the ‘mem- 


ory’ items as a scale, in view of the findings of the fac- 
tor analyses, and in view of the low reliability for single 
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items, we find ourselves in perfect agreement with Good- 
enough,! who very aptly states her opinion that ‘a test of 
general intelligence cannot be made to serve the purpose 
of a universal diagnostic instrument’, and that ‘the prac- 
tice, so unfortunately common among clinicians, of making 
pronouncements about special abilities or defects in such 
broad psychological categories as memory, visual im- 
agery, perception, and the like on the basis of one or 
two items in a Binet test, is hazardous in the extreme.’ 


1 
Е. L. Goodenough, "Review of Measuri а 
Psychol. Bull., 1937, 34, 605-609. ng Intelligence," 


152 


Chapter XI 
UNITS OF MEASUREMENT 


Much has been written in the field of psychology 
and education about measurement, and one of the topics 
of chief interest and controversy has been the units pro- 
posed, Many of the units used are purely arbitrary, even 
to the extent of being accidental, So far as we know, no 
one has brought forth a scale based upon units which 
would satisfy the criteria of equality used in the physical 
sciences, It does not follow from this that psychological 
measurement is impossible unless one restrict the word 
‘measurement’ to situations wherein a scale possessing 
physically equal units is employed, Not all the scales 
used in the physical sciences have equal units; this fact 
does not nullify, but does restrict, their use. 

Psychologists must at present be content to util: 
ize scales which have limitations, At least one claim 
can be made for nearly, if not all, psychological scales: 
they permit the rank ordering of individuals, subject, of 
course, to an ever present and at times disturbingly large 
error. Now, the choice of unit must depend partly upon 
preference and partly upon general usefulness. In Meas- 
uring Intelligence reasons were stated for retaining the 
M.A. and 1.9. scheme of scoring. We are not blind as 
to the shortcomings of such a system of units. Not 
only have no claims been made for the equality of mental 
age units but actually their inequality has been admitted 
(see pages 24-29 of Measuring Intelligence ). In fact, the 
use of LQ. units is predicated on inequality of mental 
age units, 

It is not our purpose here to review the reasons 
etuating the mental age and LQ. concepts. We 
examine some of the alternatives with the 
f scrutinizing the merits claimed for them. 


for perp 
do propose to 
special intent o 
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Before doing this we should like to digress long enough 


to discuss some of the criticisms set forth in a recent 
paper by Richardson,! 


Richardson’s Logic About Age Scales 


We have pointed out in Chapter VIII on per cents 
passing items that Richardson seems to have been mis- 
led as to the method of placing items in age levels, He 
evidently believes that the sole criterion of item validity 
used was the steepness of the curves for per cent pass- 
ing, ie. the item’s correlation with age, although it is 
clearly stated by Terman and Merrill that 'the correla- 
tion of each test with composite total score (equivalent 
to correlation with mental age) was computed separately 
for each test, thus Providing a basis for the elimination 
of the least valid tests,’ ( Measuring Intelligence, page 
22.) He also thinks that nothing in the procedure used 
Operates so as to Select items that measure a unique 


trait. These notions are so erroneous as to need no 
comment, 


As an exam 
‘logical difficulty,’ we quote: 


two hypothetical "good" items 
orrelation, A scale made up 
arily be unreliable as a com- 


© which contained items 
ty, and that the possibil- 
licable over Several ages 
A rote-memory measure 


“. ten digits would not 
"The Logi 

ЕН, 6 DOBiC Of Age Scales," Educ, Psychol 
Measmt., 1941, 1, 25-34. , y 
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be admissable, One would like to know how Richardson 
would go about measuring the intellect of a 6-year-old 
and of a 16-year-old in a manner that would avoid his 
‘logical difficulty.’ 

On page 25 of the Terman and Merrill volume it 
was stated that the expression of a test result in terms 
of age norms rests on no statistical assumptions, This 
is characterized by Richardson as ‘erroneous and mis- 
leading,’ and he goes on to say, ‘The truth of the mat- 
ter is that mental age is a measure derived from raw 
Scores in accordance with certain assumptions. It is re- 
grettable that he did not specify these assumptions. 

The greatest confusion in Richardson’s paper is to 
be found in his discussion of LQ. constancy. It is well 
known that there are several necessary conditions for 
LQ. constancy. Some of these are mentioned by Rich- 
ardson, but treated as sufficient conditions, A few quo- 
tations are in order: ‘The constancy of the LQ., if it 
exists, is imposed by the process of standardization,’ 
‘If the various sub-tests are properly scaled ... the 1.9. 
of 100 will remain constant.’ ‘The gist of the matter is 
that the 1.0. can be made to be constant,’ 

АП the statements just quoted are false, Let us 
lust the conditions necessary for LQ. constancy, i.e. for 
an individual obtaining the same LQ. within error limits, 
on successive testings over a period of time, e.g. 2 to 
13 years. (1) The same general intellectual ability must 
be called for at the various age levels, (2) The standard 
deviation of successive age 1.6. distributions must be 
equal (this means a systematic increase in the sigma for 
M.A. distributions.) (3) L.Q.'s and age must be uncorre- 
lated. These conditions, which can be reasonably well 
attained, are functions of the scale, and even if perfectly 
achieved neither they nor any other scale functions, will 
guarantee 1.0. constancy. In other words, they are not 
sufficient conditions, In order to find the latter, we must 
look to the individual, И the above necessary conditions 
obtain, then а sufficient, also necessary, condition for 
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1.0. constancy is that the growth rates of all individuals 
remain constant, Few will agree that the question of 
growth rate is a ‘false issue’ as claimed by Richardson, 
According to him the constancy of the 1.0. is an example 
of a problem where ‘the distinction between purely psy- 
chometric issues and psychological issues [is]not always 
made.’ We find ourselves wondering why he himself did 
not make the distinction instead of predicating that the 
whole thing is a psychometric problem, 

Richardson's notion regarding the cause of in- 
crease in M.A. variability with age is interesting. He 
States that *we may increase the standard deviations of 
Successive year levels by (a) selecting sub-tests which 
have higher intercorrelations at older age levels, (b) as- 
Signing a larger number of mental months to each sub- 
test.’ This latter Scheme is said to be inadmissible; 
hence 'the conclusion is inescapable that the degree of 
correlation between sub-tests must increase steadily with 
higher age levels if the LQ. is to be constant,’ It hap 
pens that we can present Some correlations which are 
more inescapable than the outcome of Richardson's logi- 
cal argument, The average intercorrelations between 
items beginning with age 2 are аз follows: .44 (2), .35 
(2-1/2), .36 (3), .41 (3-1/2), .40 (4), .35 (4-1/2), .36 (5, 
.34 (6), .43 (7), .33 (9), .36 (1D, .36 03), .48 (15), and 
.36 (18). 

One more example of this auti 
year groups have the same L 
have approximately the same m 


hor’s logic: ‘If half- 
Q. dispersion, they must 
ental age dispersion, But 


inconsistent, and not attainable at the same ti: 
Strict sense.’ This is an exam 


simply untrue. 
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UNITS ОЕ MEASUREMENT 
Note on Standard and T-Scores 


Many psychometricians have urged the universal 
adoption of some form of the standard-score unit, We 
are well aware of certain advantages which would accrue 
therefrom, but we are not convinced that these advantages 
outweigh the interpretative value of the mental age-LQ. 
combination, Furthermore, we are unable to appreciate 
some of the properties claimed for units of the standard- 
Score variety. This section will be devoted to consid- 
eration of some of those claims, 

The standard score is perhaps better adapted to 
the needs of researchers, Presumably the advantage is 
primarily that of having all tests scored in the same 
type of unit, hence making for greater comparability than 
is possible with the arbitrary and often accidental point 
Scores, In particular it has been argued by some that 
the use of standard scores would per se avoid the prob- 
lem of such differences in I,Q,’s as are found when one 
passes from scale to scale, But in order to be sure 
that such differences are really eliminated, or that the 
standard scores are really comparable, one needs to sat- 
isfy the condition that the several tests shall have been 
standardized on samplings which are comparable as re- 
gards general level and scatter of ability. 

The claim by a surprisingly large number of psy- 
chologists that the use of the standard-score method will 
yield units which are equal or ‘truly’ equal deserves 
some attention, At this point we should distinguish be- 
tween two variant methods for deriving a score of the 
standard type. First, there is the relatively simple 
scheme of dividing deviations from the mean by the stand- 
ard deviation of the distribution. This, for some writers, 
is the accepted technique for obtaining standard or Z- 
scores, The introduction of a constant multiplier, say 
10, and an additive constant, say 50, will so transform 
z-scores аз to set the mean at 50 and the sigma as 10, 
thereby getting rid of negative scores and permitting the 
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elimination of decimals, The resulting scores have some- 
times been called T-scores, but it would seem wise to 
restrict the term T-score or T-scale to its original 
meaning, which is associated with the method of scaling 
used, This method, which constitutes our second variant, 
depends upon converting the per cent attaining a given 
raw Score into an equivalent sigma by use of the normal 
curve functions,? The resulting score is so adjusted as 
to yield a mean of 50 and a sigma of 10, but it does not 
follow from this that ‘T-scores are 2-всогев [ог z-scores] 
multiplied by 10....' This relationship holds only when 
the original distribution of raw scores is normal, In the 
discussion to follow, the reader wili do well to keep in 
mind the distinction which we have made between z-scores 
and T-scores, 1.е. between the units which result from 
dividing by sigma and those Which are derived by re- 
course to normal curve functions, 

Let us now examine the claim that z-scores (also 
T-scores) are ‘truly equal’ units and can be treated as 
‘though they were all in inches or pounds,’ First, con- 
sider the z-score, By definition we have 


2 KM X М 
DNE 3E ee 


which is obviously a linear relationship of the form Y - 
ВХ + A; hence М the original X-units are equal, the 
transformation will yield z-units Which are also equal. 
But suppose either that the Original X-units were unequal 
or that we were ignorant as to their equality; will a 
simple linear transformation make equal units out of un- 
equal units, or will our ignorance be metamorphosed into 
knowledge by such simple arithmetic? The answer is 
so obvious that the question should hardly be necessary. 

Next, let us consider the T-score, One thing is 
accomplished by T-scaling which is not achieved by the 
lgor detailed explanation see pages 151- 


197 of H.E. Garrett, 
Statistics ın Psychology and Education. New York: Longmans 
Green Company, 1937. 
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z-transformation; namely, the distribution of T-scores 
will be normal, at least for the sample used in the scal- 
ing, while that for z-scores will have the same shape as 
the distribution of original raw scores. We are aware 
of the advantages of normal score distributions — our 
concern here is whether or not the normalizing by T- 
scaling has resulted in a scale of ‘truly equal’ units, Is 
there evidence that such is the case, or is it merely as- 
sumed? We know of no supporting evidence, hence we 
question the tenability of the assumption, As we are 
frankly skeptical about a purely logical approach to the 
problem, we resort to an example. 

A distribution containing measurements on 7749 
men has been T-scaled by the author. This N is suf- 
ficiently large to permit of fairly exact scaling, particu- 
larly for points near the center of the distribution, It 
is found that the difference between 130 and 140 original 
units corresponds to 5.5 T-units, while the difference 
between 190 and 200 is equivalent to 3.4 T-units, Thus, 
if T-units are equal, it means that a 10-point difference 
in original units in one region is not equal to a 10-point 
interval in another region; in fact, one is 60 per cent 
larger than the other, If this is true, we have proved 
that the difference between 130 and 140 pounds is 60 per 
cent larger than the difference between 190 and 200 
pounds, If, starting with a scale of truly equal units 
(pounds), one comes out with T-units which are definitely 
unequal, what can be expected when one starts with a 
Scale of arbitrary, admittedly unequal units? 

Another type of unit has been frequently u 
namely that which results from the so-called absolute 
Scaling methods. In this instance the original data are 
the per cents passing items rather than a distribution 
of scores, So far it has not been demonstrated that ab- 
solute scaling leads to equal units, There is at least 
one reason why one cannot expect any of the scaling 
methods (absolute or standard or T) to yield units which 
are ‘truly equal,’ viz, the fact that all scaling must be 
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done on data based upon a sample of individuals, The 
Sampling errors — in the sigma of a distribution or in 
the per cent exceeding a score or in the per cent pass- 
ing an item — will always be present, and consequently 
the derived units will be subject to chance fluctuations. 
It follows, therefore, that these methods cannot possibly 
be expected to yield equal units, We do not conclude 
Írom this that such units have no desirable properties 
— we have merely refuted the somewhat generally accept- 
ed claim that scaling leads to ‘truly equal’ units, 


Heinis Mental Growth Units 


The growth curve and unit of growth proposed by 
Heinis! have received considerable attention among cer- 
tain workers, We are not primarily concerned about the 
adequacy of the Heinis growth curve — it may describe 
mental development as accurately as any of the proposed 
curves, but it was Originally deduced from such small 
samples (with little information as to the nature of the 
samplings) that one might rightfully doubt its generality. 
We, personally, believe that the exact form of the mental 
growth curve is unknown and will likely remain unknown. 
So far as mental measurement is concerned, the chief 
issues center about the relative merits of various meth- 
ods of expressing scores, In particular, we should like 
here Е the claim, recently accepted by Kuhl- 
mann, tha е Heinis рег 1 
were Eos Personal constant (p,C.) is more 

Strictly speaking, the issue has to do with one of 
the conditions necessary for constancy, namely that the 
variability of score distributions be the same, or nearly 
so, for successive age groups, The question, therefore, 
is whether LQ. scoring or p,c, Scoring yields the great- 


1 
Н.А. Heinis,"A Personal Constant," J. Educ 
17, 163-186. - Psychol., 1826, 
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er consistency as regards variability measures, Kuhl- 
mann' has made two comparisons from which he con- 
cludes that Р.С.’5 are (more exactly, can be) more con- 
Stant than LQ.'s. His first comparison is based on the 
Р.Е. (defined as 04 - 01 instead of the usual Q3 - 9 
divıded by two) of I.Q. and of P.C. for his own test scored 
by both methods. The 1.0. variabılities fluctuate more, 
and are particularly high for the upper age levels, as 
might be expected when a straight-line growth curve is 
assumed as far as age 16. But this comparison may not 
be a fair one, since his scale was constructed on the 
basis of the Heinis curve, and therefore the P.C.’s might 
be expected to make a better showing. 

Kuhlmann’s second comparison is somewhat more 
crucial in that the variabılities for P.C.’s based on scal- 
ing according to the Heinis curve are contrasted with the 
variabilities for LQ.'s based upon the new Stanford ге- 
vision, The Kuhlmann and the new Stanford-Binet may 
be thought of as the best tests yet constructed on the 
basis of their respective underlying assumptions, It is 
doubtful whether either test approaches perfection, and it 
is known that there are real differences in variabilities 
for the new Stanford-Binet. The question is whether the 
fluctuation in standard deviations from age to age for 
LQ.'s (standardization data of the new Stanford-Binet ге- 
vision) is greater than the variation of P,E.’s for P;C,'s 
(Kuhlmann's data), Kuhlmann's chief deductions are based 
On a consideration of data for ages 6 to 16; therefore 
we will limit our discussion to these ages. 

In Table 51 will be found the S,D.’s (mean values 
for Forms L and M) for LQ. distributions and the P.E.'s 
(interquartile range) for P.C.'s. The latter values come 
from Table XXVIII of Kuhlmann, The former have been 
rounded off so that they will not be reported any more 
exactly than the available P.E.'s for the personal con- 
stant. Kuhlmann pictures these values in his Fig. I, 
!p, Kuhlmann, Tests of Mental Development.  Minneapolis:Edu- 
cational Test Bureau, 1939; see pages 86-93. 
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page 91, and concludes that LQ,'s are less constant in 
variation than are Р.С.’5. He believes that the ‘evidence 
is conclusive as to the relative merits of LQ, and 5. 
[i.e., Р.С.] scores,’ Тһе P.C. remains ‘quite constant а 
all levels above six ... whereas the 1.0. does change at 
Varying rates ...’ These conclusions were evident 
based upon the fact that the range of S,D.'s for LQ.'s is 
from 13 to 20 as opposed to the range {ог P.E.'s of 
P.C.'s, which is from 6 to 9, If we deal with averages, 
we find that the former variabilities center about a mean 
of 17 and yield an average deviation of 1,27 and an S.D. 
of 1,76, whereas the mean for the latter is 8,1 with an 
average deviation of ,83 and a sigma of 1.01. Insofar 
aS constancy depends upon stable variabilities, it would 
seem that the Р.С. would permit greater constancy, and 
that Kuhlmann was correct, 

But И we examine the data in a different manner, 
a different and more valid conclusion will emerge. This 
is a situation wherein the variation of measures of va- 
riability must be considered in а relative sense in order 


to make a proper allowance for the fact that we are 
dealing with measures based on 


One would not be justified 
variation in intelligence 


noncomparable units. 
in concluding that the real 
is greater when measured in 


TABLE 51 


COMPARISON OF FLUCTUATIONS OF VARIABILITY 

MEASURES FOR LQ.'S AND P.C.s: S.D. FOR 1.0.5 

AND P.E. (INTERQUARTILE, NOT SEMI.INTERQUAR 
TILE RANGE) FOR P.c.'s 


Age С 7 9 9 10 1 49 аз 34 ae 16 


13 16 16 17 16 18 
S.D.1 o, 20 18 


отв 
Р.Е.р.с, 9 8 7 8 8 9 6 9 
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terms of LQ, than when gauged in P.C.'s simply because 
the S,D, for a distribution of LQ.’s is numerically great- 
er than the S.D. for a distribution of P.C.’s, It is ob- 
vious that the variability of 1,0,75 is numerically larger 
than that for Р,С,75; hence when comparing the variation 
of the measures of the variabilities, we must take into 
account this difference in absolute value which is a 
function of the original unit used, In other words we 
cannot compare directly the average deviations of .83 and 
1.27, or the sigmas of 1.01 and 1.76, any more than one 
can compare numerical, untransformed values based upon 
inches and centimeters, When we take .83 and 1.27, or 
1.01 and 1.76 as measures of variation relative to the 
respective means, 8.1 and 17, we see at once that the 
variation of the P.E.'s for the personal constant is in re- 
ality greater than the variation of S.D.'s for the intelli- 
gence quotient, There can be no objection to using the 
coefficient of variability in this case, since the means, 
8.1 and 17, can be thought of as distances above real 
Zero points, i.e, the points of no variability. 

Another way of considering the figures in Table 51 
is to examine the range of variabilities, Thus for 10, 
Scoring the S,D.'s range from 13 1020, the larger being 
54 per cent greater than the smaller; for P.C. scoring 
the P.E.’s range from 6 to 9, the latter being 50 per 
cent greater than the former. The problem may be 
viewed in still another manner, If the LQ. rating of an 
individual on the new Stanford-Binet did remain constant, 
a score of 113 at age 6 would correspond to the 84th 
percentile, while a score of 113 at age 12 would fall at 
the 74th percentile. If the Р.С. rating of ап individual 
on the Kuhlmann test did remain constant, a score of 
103 at age 15 would be the equivalent of the 75th рег- 
and the same score at age 16 would be near the 
66th percentile. These shifts in percentile ratings, which 
represent the maximum to be expected in the case of 
each scoring scheme, are so similar that it is impossible 
to adjudge that one method fares better than the other. 
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It would appear, therefore, that the relative mer- 
its of mental ages and LQ,’s versus Heinis mental units 
and P.C.’s cannot be decided оп the basis of the differ- 
ence in the success with which their respective advocates 


have met one of the conditions necessary for constancy 
of indices of intellectual brightness, 
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Chapter XII 
SUMMARY 
The materials which have been presented herein 


are complementary and supplementary to Terman and 
Merrill’s Measuring Intelligence . Some chapters contain 


. more detail on certain topics than was feasible to give 


in the previous volume, while other chapters present 
data not heretofore reported, Some parts of the pres- 
ent work are devoted to data involving total scores; other 
parts are concerned with data on items, Any summary 
of such a large mass of data is indeed difficult, since 
brevity means the omission of necessary qualifications. 
It is hoped, however, that the following resume will serve 
а useful purpose. 

LQ. Distributions, - The distributions of LQ.’s for 
Forms L and M tend to approach the normal curve type. 
The discrepancies, although in some instances statisti- 
cally significant, are nevertheless small in magnitude, No 
conclusions regarding the distribution of intellect were 
drawn; as measured by the new Stanford Binet, intelli- 
gence is for all practical purposes distributed in the 


normal fashion. 
LQ.'s and School Progress, - The analysis of LQ.'s 


by age-grade location provides interesting information 
about intellect as related to school progress, The modal 
age-grade (normal-progress) groups are average іп LQ., 
while those accelerated or retarded by one grade have 
LQ.’s which average about 11 points above or below the 
general average. Those who are two grades ahead or be- 


hind deviate about 22 LQ. points from the averages for the 
modal age groups. There seems to be no difference in 
LQ. variability for age groups as opposed to grade groups, 
but as regards mental ages the grade groups are some- 
what more homogeneous than age groups, Within a grade 
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group, however, one finds a rather wide range of mental 
ages. 

1.0. by Occupation and Residence. - The data оп 1.0. 
differences associated with occupational levels and with ur- 
ban, suburban,andruralresidence tendto confirm previous 
findings. Such differences as exist emerge at the early 
ages and continue throughout the age range here utilized. 

Sibling Resemblances, - The sibling resemblances 
reported are of particular interest because of the fact 
that highly reliable 1,0,75 (average of Forms L and М) 
are involved and because all the resemblance coefficients 
are based on groups of typical 1,0, variation, For sib- 
lings of all ages, 2 to 18, the coefficient for 384 pairs 
is .53; for 42 pairs of preschool age, .55; and for 119 
pairs, one of each pair being of preschool age, the older 
being between ages 6 to 18, the resemblance is .52. 

Sex Differences, — Sex differences in LQ. tend to 
be small - about 3 points in favor of girls at the pre- 
School ages and about 2 points in favor of boys at the 
later ages, These differences, although not large, аге 
near the borderline of statistical significance, Some of 
the eliminated and also some of the retained items yield- 
ed highly significant differences in per cents passing. 
It was suggested that the student of sex difference might 
profitably direct his attention toward items rather than 
total scores, which may mask real differences between 
the sexes, 

Reliability. - Chapter VI contains a detailed anal- 
ysis of reliability, И is shown that the standard error 
of measurement for the 1,0, is definitely related to the 
magnitude of the LQ. The size of the measurement er- 
rors as well as the equivalent reliability coefficients 
were deduced by averaging the results obtained by two 
different methods, The standard errors of measurement, 
as determined for ages 6 to 13, range from 2,8 for low 
LQ.'s to 5.3 for LQ.'s in the higher brackets, and the 
equivalent reliability coefficients range from 97 down to 
„90, the higher reliability being associateg with the lower 
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10,75. Thus it is no longer permissible to speak of ‘the’ 
reliability of a scale of the Binet type, and it is likely 
that the same thing is true for group tests which utilize 
the mental age-LQ, concepts. 

Scatter, - The spread of individual performance, 
or scattering of successes and failures, was discussed in 
Chapter VIL Scatter is, of course, the result of several 
different factors: item unreliability, low intercorrelations 
among items, lack of high correlation of items with age 
(these are highest for items at the lower end of the 
scale), the presence of a series of items which call for 
some special ability, and lastly faulty age placement of 
items, These are all functions of the scale; the first 
three are inescapable, while the last two can be, and in 
the New Revision have been, fairly well eliminated, One 
feature has definitely contributed to scatter, viz, the pres- 
ence of recurring tests and recurring test situations, А 
reason for an apparently greater scatter оп the new 
Stanford-Binet, as compared to the 1916 Revision, is the 
inclusion of items at additional age levels, Scatter may 
also be a function of chance motivational factors in the 
testee, but since we have shown that the form versus 
form reliability of scatter scores approaches the уап- 
ishing point, it becomes difficult to see how any clinical 
meaning can be attached to the concept of scatter. 

Per Cents Passing. - Table 26 of Chapter VIII 
contains the per cents passing items by age. The cor- 
responding curves are steepest for the items at year 
level II, and become less steep for higher levels, They 
are skewed rather than normal ogives, Although the 
curves for items located at a particular age level do not 
ne ordinate for that age at the 50 per cent point 
of difficulty, there are items in the scale which are of 
medium difficulty for each age, except age 6. This lack 
of items of medium difficulty at this age was offered as 
an explanation for the low observed LQ, variability for 
that age group. Reasons were given for believing that 
our data on per cents passing may not be ideal as basic 
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data for deriving a mental growth curve, 

Factor Analyses, - The fourteen factor analyses 
reported were arranged so as to include each item in at 
least one analysis and to provide overlapping items be- 
tween adjacent analyses, The results indicate that the 
Several items included in a given analysis tend to be 
Saturated, though in varying degrees, with a common 
factor, This was not surprising since the methods used 
in selecting items were Such as to favor this outcome, 
There is some evidence for minor group factors, but the 
sampling errors are so large as to make difficult any 
very specific deductions concerning these factors. It 
would appear that the items in the scale are not meas- 
uring such a hodgepodge of abilities as some have ѕир- 
posed, Presumptive evidence was presented to the effect 
that the common factors at successive levels are nearly 
identical Insofar as this is true and insofar as the 
amount of variance due to possible group factors 1$ 
small, we have evidence that LQ.'s earned on the New 
Revision are comparable quantitatively and qualitatively. 

Special Scales, - The materials on special scales 
have been presented with the idea of drawing attention 
to the inadequacies of Scores based on so few items. 
The vocabulary test does constitute a fair measure of 
the type of general intelligence measured by the Binet 


test as a whole, but of course one cannot conclude from 
this thal intelligence depends upon vocabulary ability 
rather than the converse, 


Items chosen as depending less 
on the language factor were Scored as a possible non- 


verbal scale of intelligence, but the small number of 
available items has mitigated against respectable reli- 
ability and validity, Perhaps if the non-verbal items 
were augmented by other similar items, an adequate non- 
verbal scale could be constructed, The ‘memory’ scale 
is not reliable enough for diagnostic use, and even if it 
were, one might raise the question as to whether ‘mem- 
ory' rather than general intelligence 


à is being tapped. 
After all, the so-called memory items Were originally 
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selected, not because they were memory items but be- 
cause they satisfied certain criteria as measures of gen- 
eral intelligence, Those who need scales for measuring 
‘pure’ memory and other special abilities should look 
elsewhere, 

Units of Measurement, - Chapter XI was devoted 
to the problem of units of measurement and to a discus- 
sion of some of the objections raised by one of the crit- 
ics of age scales, As regards units of measurement, 
we have presented an argument which, it seems to us, 
completely nullifies the current notion that standard 
Scores, ог T-scores, are ‘truly’ equal units, Our ех- 
amination of Kuhlmann’s claim that the personal constant 
of Heinis is more constant than the LQ. has led us to 
doubt the tenability of this claim, 
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NOTE ОМ SPURIOUS INDEX CORRELATION 
BETWEEN LQ.'S 


It has long been known that the correlation be- 
tween two indices having a common variable denominator 
may be spurious, but in four recent statistical textbooks 
the 'may be' has been misconstrued as meaning ‘15’ ог 
"wil be.’ When sets of LQ.'s from two tests are cor- 
related, it is said by Garrett that the correlation will be 
Spurious, by Guilford that such an 'r is spuriously high,’ 
by Cooke that this is a ‘source of spurious relationship,’ 
while Peters and Van Voorhis cite the correlation of 
LQ.'s, also E.Q,’s and A.Q.'s, as examples of spurious 
correlation! It is the purpose of this note to show under 
What specific conditions Such a correlation is or is not 
Spurious, 

We have approached this problem by two different 
methods, but since each leads to the same conclusions, 
we will here present the less elaborate approach, Indeed, 
the matter seems so elementary that we should be ap- 
prehensive about this very simple solution if its outcome 


had not been checked by more complex reasoning, It is 
self-evident that Spurious: 


Э between LQ.'s for age constant 
cannot be spurious, so let us Set up the correlation be- 


tween two LQ.'s, x andy, in the form of a partial with 


1 
Cf. H. E. Garrett, Statistics in Ps 

Ychology and Education. 
New York: Longmans Green Company, 1937, р. ase 4 J Р би11- 
ford, Psychometric Methods м TES 


» New York: McGraw-Hil x» 
: E 1 Book Com 
pany, 1936, p. 374. р. H. Cooke, Minimum Essentials of Sta- 


tistics, New York: Macmillan Compan 
y, 1936 . 187. с. C. 
Peters and W.R. Van Voorhis, Statistical [irme and Their 


Mathematical Bases, New York: McG E 
je 237. raw-Hill BookCompany, 1940, 
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age, а, to be partialed out, Thus 


Гху - Гха Гуа 


Гала = 
кун У1 - га Yl - гуа 


from which it is obvious that гху, the correlation be- 


tween LQ,’s, will be spuriously high if both sets of LQ.'s 
are correlated in the same direction with age, spuriously 
low if these correlations are of opposite signs, and not 
Spurious at all when the LQ,'s are uncorrelated with age. 
Certainly the element of spuriousness is negligible for 
Slight, e.g. ordinary chance sampling, departures of гуа 
and г. from zero, For instance, suppose гха = Гуа = 
.30 (values this large will seldom arise as а result of 
sampling errors when N exceeds 100) and suppose гху 1$ 
near .80, then the spuriousness will be Tess than .02, 

An ideally constructed age scale will yield a cor- 
relation between 1,0. and age of zero for unselected 
cases, Although the New Revision meets this ideal, it 
must not be forgotten that groups can be so selected as 
to produce а non-chance correlation between 1,0, and 
age; in particular, a single school-grade group will tend 
to yield a negative correlation for these variables, We 
agree with the conclusion reached by Jackson in a recent 
paper! to the effect that every case involving the correla- 
tion of LQ.’s must be considered individually, but we do 
not share his general alarm about the statistical analy- 
sis of LQ,’s being meaningless, It might also be re- 
marked that one of his simplifying assumptions, namely 
that M.A. variability equals that for C.A., is not tenable, 
not even as an approximation, For both age and grade 
sampling, the former tends to be the greater, Evidence 
for this may be found in Chapter IIL 


"Some Pitfalls in the Statistical Analysis 


IR.w.B. Jackson, 
" J. Educ. Psychol., 


of Data Expressed in the Formof IQ Scores, 


1940, 31, 677-685. 
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ADJUSTMENT OF LQ.'S FOR ATYPICAL VARIABILITY 
AT CERTAIN AGES 


One of the requisites for an entirely satisfactory 
age scale is that the variability of the distributions of 
LQ.'s be reasonably equal for successive ages, It has 
been admitted elsewhere ( Measuring Intelligence , page 
40) that the differences in variability for the New Re- 
vision are likely non-chance, and in Chapter VIII of this 
volume a factor was mentioned which possibly explains 
the low variability in the vicinity of ages 5 to 6, and the 
high variability at age 12, There seems to be no obvious 
reason for the high variability at ages 2-1/2 and 15. 
Since these differences mean that 1,0,75 for individuals 
are somewhat lacking in comparability, we present here- 
With a table by which one can make an adjustment to the 


LQ.'s earned by individuals at certain ages, At the top 
of Table 52 will be foun 


If, for example, 
and is 2 years 
would be 117; 
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ТАВЬЕ 52 
1.9. ADJUSTMENTS FOR VARIABILITY DIFFERENCES 


4-10 to 11-6 to 14-6 to 


Obtained 2-4 to 
LQ.'s 3-3 6-6 12-5 15-5 
148 140 159 . 142 143 
146 139 157 140 141 
144 137 154 138 139 
142 135 152 137 137 
140 134 149 135 136 . 
138 132 147 .. 133 134 
136 130 144 131 132 
134 129 142 130 130 
132 127 139 128 128 
130 125 137 126 127 
128 124 134 124 125 
126 122 132 123 123 
124 120 130 121 ' 121 
122 119 127 119 120 
120 117 125 117 118 
118 115 122 116 116 
116 113 120 114 114 
114 112 117 112 112 
112 110 115 110 111 
110 108 112 109 109 
108 107 110 107 107 
106 105 107, 105 105 
104 103 105 103 104 
102 102 102 102 102 
100 100 100 100 100 
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Appendix В 


TABLE 52. (Cont,) 
LQ, ADJUSTMENTS FOR VARIABILITY DIFFERENCES 


Obtained 2-4 to — 4-10 to 11-6ю 14.6 to 
10,5 3-3 6-6 12-5 15-5 
98 98 98 98 98 
96 97 95 97 96 
94 95 93 95 95 
92 93 90 93 93 
90 92 88 91 91 
88 90 85 90 89 
86 88 83 88 88 
84 87 80 86 86 
82 85 78 84 84 
80 83 15 83 82 
18 81 73 81 80 
76 80 70 79 79 
74 78 68 77 TI 
T2 76 66 76 15 
70 75 63 74 73 
68 13 61 12 12 
oe 71 58 70 70 
84 70 56 69 . 68 
Ва 68 53 67 66 
60 66 51 65 64 
98 65 48 63 63 
39 63 46 62 61 
35 61 43 60 59 
52 60 41 58 57 
90 58 39 56 56 


Appendix С 
ITEM CORRELATIONS WITH TOTAL SCORE 


The retention of an item for the final forms de- 
pended in part upon its correlation with the total score 
based upon all the items of Forms L and M. We are 
giving in the following tables the biserial correlations 
for each item as computed on the age corresponding to 
its age placement. Actually, for many of the items, cor- 
relations were also determined at ages adjacent to the 
respective placement ages. In some cases, e.g. vocab- 
ulary, product-moment r's were also computed. Although 
all the available correlations were taken into considera- 
tion when retaining or eliminating items, we are present- 
ing only one for each item. This should be sufficient 
to give the reader some notion of the degree of relation- 
ship between items and the total score. 

The tables given herewith will also serve as a 
reference for identifying tests by age location and num- 
ber, and name. More detailed descriptions of the tests 


can, of course, be found in Measuring Intelligence . 


175 


Appendix С 
TABLE 53 


FORM L BISERIAL CORRELATIONS: 


Location 


L,II,1* 
2 
3 


5 
6* 
alt, 


1,1-6,1 


L,IIL1 


Name of test 


Three-hole form board 
Identifying objects by name 
Identifying parts of the body 
Block building: tower 
Picture vocabulary 

Word combinations 

Obeying simple commands 


Identifying objects by use 

Identifying parts of the body 

Naming objects 

Picture vocabulary 

Repeating 2 digits 

Three-hole form board: 
rotated 


Identifying objects by name 


Stringing beads 
Picture vocabulary 
Block building: bridge 
Picture memories 
Copying a circle 
Repeating 3 digits 
Three-hole form board: 
rotated 


Obeying simple commands 
Picture vocabulary 
Comparison of sticks 
Response to pictures I 
Identifying objects by use 
Comprehension I 
D-awing designs: cross 


*Indicates duplicate test, 
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ITEMS VERSUS TOTAL SCORE 


Scoring 
Standard 


к Ф ou ою NH NH о> н 


ка 
нын ож» ян 


Location 


L,IV,1 


alt. 


6 


Appendix C 
TABLE 53 (Сопі,) 


Name of test 


Picture vocabulary 
Naming objects from 
memory 
picture completion: man 
Pictorial identification 
Discrimination of forms 
Comprehension II 
Memory for sentences I 


Aesthetic comparison 

Repeating 4 digits 

Pictorial likenesses and 
differences 

Materials 

Three commissions 

Opposite analogies I 

Pictorial identification 


Picture completion: man 
Paper folding: triangle 
Definitions 

Copying a square 
Memory for sentences II 
Counting four objects 


Knot 


Vocabulary 

Copying а bead chai 
memory Г 

Mutilated pictures 

Number concepts 

Pictorial likenesses and 
differences 

Maze tracing 


n from 


*Indicates duplicate test. 
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FORM L BISERIAL CORRELATIONS 


Scoring 
standard 


16 


meto oM бо Өз m bo 


Dore pe EE BY qe So RS SER ©: 


л 


о Ф н 


„13 


Location 


УП 


L,IX,1 


о сл о го 


L,X,1 


олы coto 


Appendix C 
TABLE 53 (Cont.) 


Name of test 


Picture absurdities I 
Similarities: two things 
Copying a diamond 
Comprehension III 
Opposite analogies I 
Repeating 5 digits 


Vocabulary 
Memory for stories: 

the wet fall 
Verbal absurdities I 
Similarities and differences 
Comprehension IV 
Memory for sentences ш 


Рарег cutting I 

Verbal absurdities П 
Memory for designs 
Rhymes: new form 

Making change 

Repeating 4 digits reversed 


Vocabulary 

Picture absurdities II 
Reading and report 
Finding reasons 1 
Word naming 
Repeating 6 digits 
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FORM L BISERIAL CORRELATIONS 


Scoring 
standard 


w 


о ENN I to 


к осал 


= 
e wp к мю оюн ою 


Appendix С 
TABLE 53 (Cont.) 
FORM L BISERIAL CORRELATIONS 


Scoring 
Location Name of test standard 
L,XI,1 Memory for designs 1% 
2 Verbal absurdities Ш 2 
3* Abstract words I 3 
4* Memory for sentences IV 1 
5 Problem situation + 
6* Similarities: three things 3 
L,XII,1 Vocabulary 14 
2 Verbal absurdities II 4 
3* Response to pictures II + 
4 Repeating 5 digits reversed 1 
5 Abstract words II 2 
6 Minkus completion 2 
ХШ, Plan of search £ 
2 Memory for words 1 
3 Paper cutting I 2 
4* Problems of fact 2 
5* Dissected sentences 2 
6 Copying a bead chain 
from memory П + 
L,XIV,1 Vocabulary 16 
2 Induction + 
3* Picture absurdities ш + 
4 Ingenuity 1 
5 Orientation: direction I 3 
6 Abstract words II 3 


*Indicates duplicate test, 


179 


Appendix C 
TABLE 53 (Cont.) 


FORM L BISERIAL CORRELATIONS 


Location 


І,А,А.,1 


4* 


6* 


1,,5.А.П,1 
2 
3 
4 
5 
6* 


L,S.A.III,1 


о ол А ot 


Scoring 
Name of test standard 


Vocabulary 20 

Codes 13 

Differences between 
abstract words 

Arithmetical reasoning 

Proverbs I 

Ingenuity 

Memory for sentences у 

Reconciliation of Opposites 


Vocabulary 

Enclosed box problem 
Minkus completion 
Repeating 6 digits reversed 
Sentence building 

Essential Similarities 


N 


Vocabulary 

Finding reasons п 

Repeating 8 digits 

Proverbs II 

Reconciliation of Opposites 

Repeating thought of Passage; 
value of life 


лоњ м $ № BO 4 WWW CO н LO D [DO к 


Vocabulary 

Orientation: direction II 
Opposite analogies II 
Paper cutting II 
Reasoning 

Repeating 9 digits 


© 


MHH ONO H 


*Indicates duplicate test, 
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Appendix С 
TABLE 54 


FORM М BISERIAL CORRELATIONS: 
ITEMS VERSUS TOTAL SCORE 


Scoring 


Location 


M,II,1 


5 
6 
alt, 


Name of test standard 


Delayed response 
Identifying objects by name 
Identifying parts of the body 
Three-hole form board 
Picture vocabulary 

Word combinations 

Naming objects 


Identifying objects by use 
Motor coordination 
Naming objects 

Picture vocabulary 
Repeating 2 digits 
Obeying simple commands 
Stringing beads 


Block building: bridge 
Picture vocabulary 
Identifying objects by use 
Drawing a vertical line 
Naming objects 
Repeating 3 digits 
Three-hole form board: 
rotated 


Comparison of balls 

Patience: pictures 

Discrimination of animal 
pictures 

Response to pictures I 
(evel D 

Sorting buttons 

Comprehension I 


. Matching objects 


*Indicates duplicate test, 
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2 


4 
3 
1 
2 
Ж 
3 
4 
Ж 
4 
T 
1 
2 
2 
Ж 
10 
5 
$ 
5 
1 


to 


оюн м 


Appendix С 
TABLE 54 (Сопі,) 


FORM M BISERIAL CORRELATIONS 


о c > c t 


alt, 
M,V,1 


Name of test 


Picture vocabulary 
Stringing beads 
Opposite analogies I 
Pictorial identification 
Number concept of two 
Memory for sentences I 


Discrimination of animal 
pictures 


Discrimination of animal 
pictures 

Definitions 

Repeating 4 digits 

Picture completion: bird 

Materials 

Comprehension П 

Patience: pictures 


Picture vocabulary 
Number concept of three 
Pictorial Similarities 
and differences 
Patience: rectangles 
Comprehension II 
Mutilated pictures 
Knot 


Number Concepts 


Copying a bead Chain 
Differences 


Response to pictures I 
(evel ID 


Counting 13 pennies 
Opposite analogies [ 


*Indicates duplicate test, 
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Scoring 
standard 


12 


о 


зы NEN EEDA 


но н омм Ф 


ыы 


„68 


Location 


M,VII,1 


Ф єл > c t 


М,УШ,1 


CO» c 6 ot 


Appendix C 
TABLE 54 (Cont.) 


Name of test 


Giving the number of 
fingers 

Memory for sentences II 

Picture absurdities I 

Repeating 3 digits reversed 

Sentence building I 

Counting taps 


Comprehension III 
Similarities: two things 
Verbal absurdities I 
Naming the days of the week 
Problem situations 
Opposite analogies II 


Memory for designs I 
Dissected sentences I 
Verbal absurdities II 
Similarities and differences 
Rhymes; old form 
Repeating 4 digits reversed 


Block counting 
Memory for stories г 
the school concert 
Verbal absurdities ш 

Abstract words I 
Word naming: animals 
Repeating 6 digits 
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FORM M BISERIAL CORRELATIONS 


Scoring 
standard 


orn ps 
о to e соон о мо мо ONE мо н н 


m 
н Мэ к ас; 


f 


.65 
.62 
.65 
‚10 
„61 
.31 


471 
.83 
.74 


Location 


M,XI,1 


M,XIV,1 
2* 
3 


4 
5 
6 


Appendix C 
TABLE 54 (Cont.) 


Name of test 


Finding reasons 

Copying a bead chain 
from memory 

Verbal absurdities II 

Abstract words II 

Similarities: three things 

Memory for sentences III 


Memory for designs II 
Response to pictures II 
Minkus completion 
Abstract words I 


Picture absurdities П 
Repeating 5 digits reversed 


Plan of search 
Memory for stories II: 
acrobat 


Dissected sentences II 
Abstract words II 
Problems of fact 


Memory for Sentences IV 


Reasoning 

Picture absurdities IH 
Orientation: direction I 
Abstract words III 
Ingenuity 

Reconciliation of Opposites 


*Indicates duplicate test, 


184 


FORM M BISERIAL CORRELATIONS 


Scoring 
standard 


2 


BPH о ою + mc c њ |+ 


+ 


бск мю > MO 


мю юк мю со + 


Appendix С 
TABLE 54 (Cont) 


FORM M BISERIAL CORRELATIONS 


Location 


М,А.А,,1 


М,8.А.П,1 
2 
3 
4 


Name of test 


Abstract words Ш 
Ingenuity 

Opposite analogies III 
Codes I 

Proverbs I 
Orientation: direction I 
Essential differences 
Binet paper cutting 


Minkus completion 
Opposite analogies IV 
Essential similarities 
Repeating 6 digits reversed 
Sentence building II 
Reconciliation of opposites 


Proverbs II 
Ingenuity 

Essential differences 
Repeating 8 digits 
Codes II 


Scoring 
standard 


Repeating thought of passage 1; 


value of life 


Proverbs III 

Memory for sentences у 

Orientation: direction II 

Repeating 9 digits 

Opposite analogies IV 

Repeating thought of 
passage П: tests 


*Indicates duplicate test. 
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